This section describes how to use the job_mon.sh script to monitor jobs, such as cron jobs. job_mon.sh is a script that runs a job. If the job fails (exits with a status other than 0), job_mon.sh sends an alert message via Jabber. The alert message contains information about the job and any information that the job sent to STDERR. Optionally, job_mon.sh can provide an alert message if the job completed successfully. Its usage is simple:
./job_mon.sh -j [job with or without args]
The -j argument specifies the job or program for job_mon.sh to run. The job and any job arguments should be enclosed (as a single string) in quotation marks. An optional -s flag specifies that an alert should be sent on success. The -h option provides a help message.
The job_mon.sh script can be downloaded from the tools folder. Once downloaded, you will need to chmod it (chmod a+x job_mon.sh) in order to run it. job_mon.sh relies on jabber_alert.pl.
job_mon.sh requires setup for the JID of the alert sender and recipient, in addition to a password for the sender. Open job_mon.sh in an editor, and you will see the user configuration near the top of the file. You should edit the file to provide a recipient_jid, sender_jid and sender_pw. Note that recipient_jid may be the same as sender_jid. You may enter other optional information below.
job_mon.sh, the script file must contain a Jabber user password. See the warning in the help message. Once you have edited your Jabber information in job_mon.sh, you can easily test it. Choose a simple program to run, such as a text editor, and launch it with job_mon.sh :
./job_mon.sh -j "nedit"
Your program should start. Now, from another terminal, kill the program you started with 'job_mon.sh':
ps -A | grep nedit
>25430 pts/1 00:00:00 nedit
kill 25430
You should receive a Jabber message that looks something like this:
[13:03:38] <monitor_user@somedomain.com>
Alert: Alert Message
Time: Wed, 28 Jul 13:08:48 UTC: -4.00
User: me
Host: machine.somedomain.com
Service: nedit
Status: Job Failed: nedit terminated
with signal 15 (SIGTERM)
------------Beg-Summary------------
------------End-Summary------------
Subject: Monitor Alert
I use job_mon.sh to monitor my backup jobs. For example, I run a weekly backup to tape. My fcron entry uses job_mon.sh :
0 2 * * 6 /home/scripts/job_alert.sh -s -j "/home/scripts/tape_backup_full.sh"
The morning after the backup is run, I receive an alert message for successful run or failure. For example, this script recently produced a Jabber success alert:
[03:15:39] <monitor_user@somedomain.com>
Alert: Alert Message
Time: Sun, 25 Jul 03:15:38 UTC: -4.00
User: me
Host: myhost.mydomain.com
Service: /home/scripts/tape_backup_full.sh
Status: Job Completed Successfully
------------Beg-Summary------------
Rewinding tape
Tape rewind succeeded.
Mounting /boot
Boot mount succeeded.
Starting backups to tape...
Backing up: /home/
star: '/home/./me/.kde3.1/kdeinit-:0' unsupported file type 'socket'. Not dumped.
star: '/home/./me/.SB-sock' unsupported file type 'socket'. Not dumped.
star: 47590 blocks + 0 bytes (total of 1559429120 bytes = 1522880.00k).
star: The following problems occurred during archive processing:
star: Cannot: stat 0, open 0, read/write 0. Size changed 0.
star: Missing links 0, Name too long 0, File too big 0, Not dumped 2.
star: Processed all possible files, despite earlier errors.
Backups complete. Unmounting /boot
Boot unmount succeeded.
Backups complete!
------------End-Summary------------
Subject: Monitor Alert
job_mon_alert.sh catches all output that the job sends to STDERR, and this output is displayed in the summary section of the alert. If you are using job_mon_alert.sh with your own shell scripts, you may wish to pipe STDOUT to STDERR so that the alert message captures everything your script would output to STDOUT. For example, the echo command as piped below would output STDERR:
echo "Backups complete!" >&2
When a script containing this piped command is run, the text "Backups complete!" would output to the tty and it would be captured as STDERR by job_mon.sh. Note that job success or failure is not determined by whether the job produces any error output. Rather, job_mon_alert.sh reports failure or success based on the exit status of the job that is run.
© 2003 Will Kamishlian and Robert Norris
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/1.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
|
|
|
|
![]() |