| MonkeyBrains.net/~rudy/example | Random examples |
3ware monitoring
| nagios_3ware using check_by_inet |
|---|
|
Nagios is a quick way to consolodate all those monitoring crons into one place. Having them spread across more than 2 machines can really bog you down. :) Often, we install 3ware cards. In a cluster (with an internal LAN) it is easier (for me) to setup inetd to do queries rather than setup SSH with keys and then turning on sudo for 3ware scripts (tw_cli needs root access). Probably more secure as well... opening up one port to run one script as root is better than opening up ssh access from your monitoring box. OK, all that said:
I found a comprehensive Nagios_3ware checking script on roedie.nl. Download it, plop it in /root/, make a couple of edits, setup inetd, test it, set up nagios to do check_by_inet. This can be used for any script you want to run as root or as any other user without any login credentials -- think "internal network" or "firewall that port" to keep the big ol' Internet out. |
Edit the scriptYou downloaded check_3ware from roedie.nl but it needs a couple of edits to work as an inetd script. You need to set the PATH in your script as inet is doesn't do that as it isn't a shell.... PATH needs to include awk and grep. I go ahead and manually code in the location of tw_cli as well.
# changes to check_3ware.sh
... Change the part that defines TWCLI ...
PATH=/usr/bin:/bin
TWCLI=/usr/local/bin/tw_cli
... Add this right BEFORE the case "$EXITCODE" in part of check_3ware.sh ...
# if we are calling via check_by_inet, prefix the exitcode
if [ "X$inetd_dummy" != "X" ]; then
echo -n "$EXITCODE "
fi
Setup inetdLook through /etc/services and pick a service you like the name of and isn't in use. Lets say you pick venus ... that means you'll get check_3ware.sh listening on port 2430 (the venus port). Here is your /etc/inetd.conf config:# /etc/inetd.conf venus stream tcp nowait.4 root /root/check_3ware.shThe .4 means: don't let this script run more than 4 times per minute. Really, you should only be checking every hour or so --- no point in getting alerts about RAID every 10 minutes. You get one alert, and you head down to the data center and swap out the disk as soon as you can. Filling up your mailbox will just train you to ignore alerts. Side rant: a well configure monitoring system ONLY emails you when you actually have to do something. Spurious alerts train the brain to treat alerts like spam. Assuming you don't have any other inet.d stuff running, you need to start it now... /etc/init.d/openbsd-inetd restart and to make it startup on reboot, you need to set up your boot (rc3.d in CentOS, rc2.d in Debian, rc.d in BSD, good old rc.local, whatever) script. Test!#telnet host-with-3ware.example.com venus Trying 10.10.10.2... Connected to gibbon. Escape character is '^]'. 0 UNITS OK: /c0/u0 OK - Connection closed by foreign host.The 0 will be our exit code used by check_by_inet. check_by_inetI saw there was a check_by_ssh ... lots of admin overhead there. ssh keys, setting up sudo for scripts to run as root, etc. That is where I got the idea to spend an hour making check_by_inet: because I am LAZY! I figure an hour spent on this script will pay itself off once I deploy to at least 20 boxes -- I am saving 3 minutes per box by using check_by_inet (I hope)! You, my friend, are saving even more time by just copy and pasting this schiznit.Drop this in /usr/lib/nagios/plugins on the nagios main server (not the client end). Next, set up the 'command' on the nagios server:
# /etc/nagios3/conf.d/check_by_inet.cfg
# The check by inet command! Takes the PORT number as it's arg.
define command {
command_name check_by_inet
command_line /usr/lib/nagios/plugins/check_by_inet -p '$ARG1$' -H '$HOSTADDRESS$' -t 10
}
# check that 3ware card state via inet
define service {
hostgroup_name 3ware-servers
service_description 3WARE
check_command check_by_inet!2430
use generic-service
normal_check_interval 30
retry_check_interval 10
notification_interval 72 ; 10minutes * 72 = 12 hours
}
Goodness, it actually works. Took me more like 3 hours to figure all this out. :)
|