UPDATE 24.09.2015 Script version 2.20150924.4
- Script now handles dependency monitors correctly. Thanks to my colleague David Risch for this bug report.
- I have rewritten the logging: all detailed alert and monitor information can now be exported into CSV files by using the new switch -ExportDataToCSV
In big environment this script can potentially close hundreds of alerts and this information would be too big for an event log entry
Please read the corresponding blog post
I created a script to help me with the SCOM alert handling and management according to Microsofts good practices. This script handles alerts:
- It closes all open SCOM alerts (ResolutionState <> 255) where the TimeRaised property is older than a certain age.
-> This will clean up my console if the Operators have forgotten to close alerts in time
- It closes all new SCOM alerts (ResolutionState = 0) where the TimeRaised property is older than the configured SCOM alert grooming age (default 7 days)
-> This will clean up my console if the Operators have forgotten to acknowledge or otherwise handle open alerts
- It closes alerts created by specific workflows in a configurable amount of time by using an XML file with custom alert handling rules
-> This allows you to configure exceptions to the rules above. E.g. some alerts created by the AD MP rules are not relevant after X hours. So you can close the alert safely because it will be triggered again if the issue still exists.
See attached XML file
But this script takes also care of the causing monitors and their health state:
- If a monitor alert is closed by the script it will automatically reset the causing monitor if necessary.
-> This will ensure that if the root cause has not been fixed the monitor will trigger a new alert during the next execution cycle.
- It analysis closed monitoring alerts for the last 24 hours. If there is a closed monitor alert not tagged by the script than it is probable that an Operator has closed this alert manually. The script checks if the causing monitor has been reset. If
not, the script will reset the monitor.
-> This handles the common mistake that Operators will close alerts without taking care of the causing monitor.