AutoCloseOnNagiosRecoveryMessages
Problem Description
We use Nagios to check if our machines are up and working. Every time something strange happens (swap use is too high, CPU load is above 10, and so on ) it sends an e-mail with a subject like " * PROBLEM boxxor/CPU load os CRITICAL *". As soon as things back back to normal it sends another message " * RECOVERY boxxor/CPU load os OK *". So, this will create two tickets in RT - two tickets that ougt to be manually merged and closed. To make things easier here I adapted the above script to merge ALL pending open/new PROBLEM messages related to a given RECOVERY message and automatically close/resolve these tickets.
History
- Mar 2004 - original version from Todd Chapman extracted from an email message
- Nov 2009 - Sunnavy uploads plugin to the CPAN
- Mar 2010 - Kamil's simplification of Todd's variant (requires Nagios3)
Solutions
Original Todd's version
Description: Merge Into Existing Ticket on match Condition: OnCreate
Action: User Defined Custom action preparation code:
1;
Custom action cleanup code:
# If the subject of the ticket matches a pattern suggesting # that this is a Nagios RECOVERY message AND there is # an existing ticket (open or new) in the "General" queue with a matching # "problem description", (that is not this ticket) # merge this ticket into that ticket # # Based on http://marc.free.net.ph/message/20040319.180325.27528377.en.html my $problem_desc = undef; my $Transaction = $self->TransactionObj; my $subject = $Transaction->Attachments->First->GetHeader('Subject'); if ($subject =~ /\*\* RECOVERY (\w+) - (.*) OK \*\*/) { # This looks like a nagios recovery message $problem_desc = $2; $RT::Logger->debug("Found a recovery msg: $problem_desc"); } else { return 1; } # Ok, now let's merge this ticket with it's PROBLEM msg. my $search = RT::Tickets->new($RT::SystemUser); $search->LimitQueue(VALUE => 'General'); $search->LimitStatus(VALUE => 'new', OPERATOR => '=', ENTRYAGGREGATOR => 'or'); $search->LimitStatus(VALUE => 'open', OPERATOR => '='); if ($search->Count == 0) { return 1; } my $id = undef; while (my $ticket = $search->Next) { # Ignore the ticket that opened this transation (the recovery one...) next if $self->TicketObj->Id == $ticket->Id; # Look for nagios PROBLEM warning messages... if ( $ticket->Subject =~ /\*\* PROBLEM (\w+) - (.*) (\w+) \*\*/ ) { if ($2 eq $problem_desc){ # Aha! Found the Problem TICKET corresponding to this RECOVERY # ticket $id = $ticket->Id; # Nagios may send more then one PROBLEM message, right? $RT::Logger->debug("Merging ticket " . $self->TicketObj->Id . " into $id because of OA number match."); $self->TicketObj->MergeInto($id); # Keep looking for more PROBLEM tickets... } } } $id || return 1; # Auto-close/resolve this whole thing $self->TicketObj->SetStatus( "resolved" ); 1;
Extension from Sunnaby
Kamil's version for Nagios3 and newer
by Kamil Srot (kamil.srot at nLogy dot com) 26/03/2010
First of all - sorry for my coding, I don't know Perl at all :-( Feel free to upgrade the script and let me know :-)
I use Nagios3 and it comes with nice macro defined making integration with RT much easier. Here is example of notification, defined in Nagios (commands.cfg):
# 'notify-host-by-rtemail' command definition define command{ command_name notify-host-by-rtemail command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nEventID: $HOSTPROBLEMID$\nLastEventID: $LASTHOSTPROBLEMID$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ } # 'notify-service-by-rtemail' command definition define command{ command_name notify-service-by-rtemail command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nEventID: $SERVICEPROBLEMID$\nLastEventID: $LASTSERVICEPROBLEMID$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ }
Note the $HOSTPROBLEMID$, $LASTHOSTPROBLEMID$, $SERVICEPROBLEMID$ and $LASTSERVICEPROBLEMID$ macros.
The *PROBLEMID is new unique ID for the first time, a problem appears and is constant till final RECOVERY. RECOVERY has everytime *PROBLEMID eqal to 0 and LAST*PROBLEMID is the *PROBLEMID or all previous notifications.
I use code like this, to process incoming emails and close open tickets and merge the corresponding ones:
# ziskej telo mailu my $T_Obj = $self->TicketObj; my $AttachObj = $self->TransactionObj->Attachments->First; my $content = $AttachObj->Content; # extract EventID and LastEventID my $val = 0; my $EventID = undef; my $LastEventID = undef; if( $content =~ m/^\QEventID:\E\s*(\S+)\s*$/m ) { $EventID = $1; } if( $content =~ m/^\QLastEventID:\E\s*(\S+)\s*$/m ) { $LastEventID = $1; } if($EventID == 0) { $val = $LastEventID; } else { $val = $EventID; } # Hledej ticket se stejnym EventID my $TicketsObj = RT::Tickets->new($RT::SystemUser); $TicketsObj->LimitQueue(VALUE => 'Monitoring'); $TicketsObj->LimitCustomField(CUSTOMFIELD => 'NagiosProblemID', OPERATOR => '=', VALUE => $val); if ($TicketsObj->Count > 0) { # nalezeno! my $id = undef; my $ticket; while ($ticket = $TicketsObj->Next) { next if $self->TicketObj->Id == $ticket->Id; $id = $ticket->Id; last; } if ( $id ) { # ...merge into $self->TicketObj->MergeInto($id); # kdyz je EventID = 0 zavirame parent ticket if($EventID == 0) { $self->TicketObj->SetStatus('resolved'); } # ...and exit return 1; } } # hmm, novej ticket. # nechame ho propadnout do fronty # a nastavit NagiosProblemID $self->TicketObj->AddCustomFieldValue( Field => 'NagiosProblemID', Value => $val, RecordTransaction=>0 ); # pokud je to recovery, tak nastavit na resolved if($EventID == 0) { $self->TicketObj->SetStatus('resolved'); }