Server Monitoring Basics

One of the important aspects of hosting a server (besides backups!) is monitoring. Knowing when your server is offline and needing manual intervention to bring it back online is key to running a server. Without having the proper monitoring and notifications in place can cause site outages which can costs businesses their money and customers. Because you can’t always be on your server 24/7 checking things and making sure they are working (well you can but you might find yourself very tired), it is very important to have the proper monitoring and notifications to keep your site online as much as possible. In order to have the proper monitoring, you will need to know what you have to monitor and how to accomplish that.

The first task is to understand what you need to monitor. For that, you need to consider some of the following questions.

  • What applications are running that need monitoring? (Apache, Corosync/Pacemaker, DNS, DRBD, HAProxy, MemCache, MySQL, NFS, Nginx, NTP, SMTP, SSL Certs, etc.)
  • What tests can you run that will show the system in good and bad health?
  • Determine which service alerts are Critical (alerts that show a service is down or something that could cause downtime immediately) vs. Non-Critical (alerts that are not causing downtime but can soon if not properly handled).
  • Review and tweak. While not all alerts can be prevented, some can. Review all alerts and see what issues can be worked on to prevent those issues that alert you at the unwanted hours of the night!

After you’ve determined what needs to be monitored, you need to determine what monitoring software to use. There are a number of paid and free software that will do the trick. Below are some examples.  You may find that a combination of these fits your needs.

  • Paid
    • Pingdom – External website monitoring, response time, and alerting
    • Pagerduty – Incident alerting
    • Nagios XI – Infrastructure monitoring solution
    • ServerDensity – Server/web site monitoring
    • Datadog – Server monitoring
    • NewRelic – Server and application monitoring
  • Free
    • Collectd – System Statistic Collection
    • Graphite – Store and Render Data Graphs
    • Munin – Resource Monitoring Tool
    • Nagios Core – Infrastructure Monitoring Solution
    • Zabbix – Enterprise Monitoring Platform
    • Icinga – Open source enterprise monitoring
    • Grafana – Metrics dashboard and graph editor

The point is, start monitoring your application and server and begin collecting historical metrics so when you detect an outage, you have information that can be used to determine the cause of an issue, or even better prevent an issue before it causes an outage!

If you need assistance monitoring your server, let us know — we will be glad to point you in the right direction.