If you've gone through the effort of creating a website for your business, and you've created original, engaging content to attract new customers, you'll want to make sure that your website loads quickly for users, and is always accessible (i.e. doesn't have any downtime). This is where website monitoring tools can help.
Depending on how complicated your monitoring needs are, there are many different services available that can help you monitor your website and web applications at various levels of granularity. For example, there are simple tools that will alert you when your website is unreachable, and there are more complex tools that can alert you when specific services are seeing abnormally high latency.
In this article, we'll discuss the benefits of website monitoring, and introduce you to the different types of monitoring tools that are available.
Why Website Monitoring Tools are Important
Monitoring your website is an important step in ensuring that you retain your existing customers and attract new users. For example, if your business is featured on an episode of Shark Tank, and your website can't handle the increased traffic from all of the viewers, you won't just lose customers -- your entire reputation will suffer, and you'll lose credibility.
But monitoring the health and performance of your website isn't just important for small businesses experiencing their first rush of traffic -- it's even more important for well-established companies that want to satisfy their existing user base. Imagine what would happen if Google's search results started to take 10 seconds to load, or even worse, started timing out completely. The Site Reliability Engineering team at Google would want to know as soon as this started happening. To do this, they would want to create automated alerts that would let them know when the site is having any issues, so that they can fix them as quickly as possible. Without these alerts, the team would only be aware of any issues when they started receiving customer complaints, which is too late.
Since the Site Reliability Engineering team can't monitor every component manually, they have to use monitoring tools that provide easy-to-read dashboards that display information about website performance and alert them as soon as anything looks problematic.
There is too much at stake to ignore the importance of monitoring your website and business-critical services. Google understands this -- and in fact, the tech giant devotes an enormous number of resources to ensure that its websites remain highly available and lightning fast. A team of Google employees actually wrote a book, Site Reliability Engineering (freely available to read here) that has become a reference guide for site reliability engineering teams all over the world.
What not to expect from monitoring
Monitoring won't help solve all your problems. Its main value lies in the simple fact that if you don't know what your problem is, you can't fix it.
Monitoring cannot prevent your web servers from crashing when too many people visit your site. But it can help you detect whenever your website is receiving an abnormal amount of traffic or observing abnormally high latency, which can alert your engineers to investigate these issues before they begin to negatively impact many of your users.
As the number of digital businesses has grown over the decades, so has the number of monitoring solutions, as well as the types of monitoring available to help your business. You may find that you don't have the need or the budget for every single type of monitoring tool, but you should probably prioritize what kinds of issues are important to you, and choose a monitoring tool that helps address your top concerns. For small businesses with a basic website, there are even free solutions online that will simply alert you when your website is down, or loading more slowly than expected. (For example, you can use Uptime Robot to quickly set up an alert that can ping your website every 5 minutes, and send you an alert when your website is down.)
If you have a basic website, and aren't expecting much traffic, then a tool like this is sufficient. For higher-volume sites where performance is more important, you'll want to consider some of the other options that are available.
We'll walk through the various types of monitoring software systems, discuss what types of concerns they are designed to address, and explain what qualities to look for when selecting monitoring software.
Synthetic monitoring is the practice of sending simulated requests to your services/websites. It's also known as "proactive monitoring", because it helps you proactively detect issues with your services and applications that may or may not currently impact your end users.
For example, let's suppose that you run an e-commerce company, and you recently introduced a wishlist functionality that none of your users currently use. Before you plan a big marketing push to highlight this new feature, you'll first want to make sure that it's working as expected. To do this, you can use a synthetic monitoring service to create requests that track the success and performance of a typical user's journey when using this feature. Synthetic monitoring can be as basic (pinging a URL) or as extensive as you desire -- you can record a series of transactions and instruct the software to simulate requests that follow those steps on a regular basis, and report the results back to you. This can help ensure that your wishlist functionality is working as expected, and give your team time to address any errors, before announcing it to the world.
If you have customers across the world, you'll need to implement synthetic monitoring in order to ensure the quality of user experience across all relevant geographic locations. If this is the case, you'll need to look for a synthetic monitoring tool that has set up testing nodes in all of the geographic locations you care about. Sending simulated requests from regions around the world can help you detect if you need to make changes to your setup. For example, if you noticed that your pages were loading very slowly in Seattle, but many of your users are located in that area, you can look into switching to another CDN (Content Delivery Network) to reduce the page load times for users in that region.
Many synthetic monitoring tools also offer something called a "waterfall analysis", which breaks down each request into its various subcomponents. This enables you to identify the steps that are taking the longest time to load, so that you can take measures to address any potential latency issues before they affect your real users.
For simple websites or blogs, synthetic monitoring is largely what you'll want to focus on, in order to make sure that your website is up and running. Many systems offer this functionality for free, or for a small fee.
Real-user monitoring (RUM)
Real-user monitoring (RUM), also known as "passive monitoring", tracks statistics about your real users' experiences and interactions with your services. RUM is considered passive, since it only helps you identify issues after they've occurred, by analyzing data from actual user logs. This method of collecting data from your actual users can help you surface valuable insights, such as when your users typically stop engaging with your content, or which web browsers encounter the most errors. You can then make tweaks to your product accordingly, to ensure the best possible experience for all of your users.
In order to surface these insights, you'll need to look for a real-user monitoring tool that makes it easy to visualize real user interaction data. The easier it is to visualize your metrics, the more likely you are to form actionable insights that help improve your digital products.
Google Analytics offers some basic RUM functionality, as it lets you identify sources of traffic to your site and landing pages, and user click paths. This is a great starting point for high-level analysis, but if you want to perform more complicated analysis about your site's performance as users browse your site, you'll need more sophisticated RUM tools. Google Analytics also samples user sessions, so will miss out on some page views that other RUM tools will be able to capture. These tools can also provide entire session-level information, whereas Google Analytics does not.
Top solutions RUM tools include: Pingdom, AppDynamics, and Dynatrace.
Application Performance Monitoring (APM)
If you have a high-traffic website, where any inefficiency in your application or in page load time will negatively impact many of your customers, then you'll want to consider using application performance monitoring tools. APM tools not only help you identify the existence of any errors or any potential issues that could impact the end-user experience, but can also help you diagnose the possible causes, whether it be from an external service dependency, or from a certain function call in your code that needs to be optimized.
If your website uses many services, some APM software systems, such as New Relic, are able to create service maps, which can help you understand how different services can impact different components of your site. This is helpful, since if one service is having issues, the service map can help you understand if any other downstream components might be adversely impacted. Alternatively, if you are seeing issues in a service, the service map can help you understand where to look when trying to diagnose the source of the issue.
Top application performance monitoring tools include: New Relic, AppDynamics, Dynatrace, and Riverbed.
Infrastructure monitoring tools help you monitor the health of your servers, by enabling you to track important infrastructure health metrics, such as CPU, memory, disk, load, and network health. Whenever any anomalies are detected, the infrastructure monitoring tools will alert you, and will typically enable you to correlate issues in your infrastructure with system changes or code releases, so that you can identify the source of the problem.
Whereas APM helps you extract granular details about your performance (how each service is performing, or even how certain function calls are performing), infrastructure monitoring typically helps you monitor high-level details about your servers' performance. In the past, these have been provided as separate tools. However, some companies provide both APM and infrastructure monitoring capabilities in one system, which gives you a complete view of your system's performance.
Top infrastructure monitoring tools include: Datadog, SolarWinds, ScienceLogic, and Nagios.
What to look for in a monitoring tool
What do you really need?
If you have a basic website or blog, and don't receive much traffic, you probably only need a free, basic monitoring tool like Uptime Robot, which will ping your website periodically to make sure that it's up. However, if you receive a lot of traffic, or have a more involved application, then it's worth looking into RUM, APM, and infrastructure monitoring tools, which will typically cost money, but will be a worthwhile investment.
Is the trial easy to set up?
Many monitoring tools offer a free trial, which gives you an idea of how the product will work in your actual environment. When you start your trial, take note of how easy or difficult it is to get started; if you find it easy to get started, consider asking a colleague to sign up for a trial as well, to double-check that it's just as easy for them to get started. The easier it is to set up, the more likely you'll be able to experience the full benefits of the trial, and get a good sense of whether or not it will work for your needs.
If you are able to sign up for an account, and you start seeing real data displayed in dashboards, you can probably take it as a good sign that it is a quality product that will deliver what you need.
Can you analyze performance trends over time?
You'll need a tool that retains your data for a sufficiently long period of time, so that you can analyze trends over time. Some tools will even allow you to set up automatic alerts when they detect potential issues.
How accurate are the alerts?
Alerts are only useful if you trust them. If the monitoring tool is too sensitive, and reports too many false positives (i.e thinks that something is wrong, when in fact everything is OK), then you'll likely ignore the alerts. You'll want to make sure that when the monitoring tool alerts you that something is wrong, then there actually is an issue worth investigating. On the flip side, you'll also want to make sure that whenever there actually is an issue, that it alerts you, so that you don't miss out on any important issues.
A bonus is if these alerts can also hook up to your collaboration tool of choice, such as Slack or PagerDuty, and send SMS messages.
Are the dashboards easy to customize and share?
The more comfortable your teams are at creating custom dashboards and sharing them with each other, the more effectively your company will be able to monitor all of your services/tools/applications. Custom dashboards will also encourage teams to be more proactive about the types of metrics they can look for to detect and address issues over time.
Is it SaaS or on-premise?
Software-as-a-Service (SaaS) provides a lot of flexibility, including automatic software upgrades, and automatic data backups. However, some customers, especially those in industries with stringent security requirements, may only be able to use on-premise solutions. If you're in one of these industries, you'll want to make sure that the monitoring software you select offers an on-premise version, or meets your stringent security standards.