We are a self hosted campus using a load balanced environment with up to 15 app servers (we presently have 10 available). We had an issue arise a few days ago whereby our DB server was unavailable but Learn was still up. Our current pingdom checks did not catch this. For those of you using Pingdom to monitor your environment, would you mind sharing how you are presently monitoring it?
For example, our Pingdom environment was only checking the health status of a single app server. We are now thinking it might be best to script the entire logon process via Pingdom. Looking for feedback from other orgs as to how you monitor your environment.
Perhaps it would also be best to map our our entire environment, and build our checks based on possible points of failure?
Message was edited by: Joe Sepulveda