Over the weekend I’ve been in charge of monitoring one of our bigger portals which consists a database server (running SQL Server 2005) and multiple web servers (running IIS 6.0 and ASP.NET) connecting to it. The architecture of the system is the following:
I’ve chosen a really simple – and at first glance efficient – way of monitoring the system with the Windows Performance Monitor (also known as PerfMon):
- On the SQL server I’ve added a counter to show the Logical Connections (within the Sql Server: General Statistics). This – unlike its name would suggest – shows the actual open connections of the server. My experience is that this is usually the critical load spot. (Even though I’ve seen CPU usage and memory usage can be critical as well but these are usually related to programming issues and are usually detected early after deploying or testing the system).
With the system I worked with the system started to get slow when logical connections were above 150-200 per second and the system could go up to about 800-900 per second however above 400 the response time was incredibly slow. These numbers of course are drastically influenced by the hardware running SQL Server.
- On the web servers I’ve added counters to the Request Execution Time within the Web servers: ASP.NET performance object. This counter returns the response time (in miliseconds) that the latest successfully executed. Peaks occured often when monitoring the system however while these peaks are rare and don’t reach a critical time (usually 0.5-1s) usually no action has to be taken. I only had to recycle the app pool (or when this didn’t help restart IIS) when the response time started to stay above a few seconds for a longer period of time.
With the above counters set monitoring worked like a charm and produced diagrams such as the following:
While monitoring heavy traffic hit the site and I was able to detect which servers were unable to deal with the load and needed action taken like service restarts (on the screenshot above it seemed that the first web server was sometimes responding slowly however spikes as such seemed to be appearing randomly so the behavior seemed most normal).
This method seemed like a simple but efficient way to determine if a server is overloaded or not. You might consider adding some other performance counters (see some useful ASP.NET counters here Should you know of a better / more efficient but not too complex way of doing so feel free to share in the comments.