Selecting a Monitoring Solution
A monitoring solution is a critical part of any application infrastructure. Selecting the right monitoring solution involves many criteria and considerations. A monitoring solution can provide many different types of monitoring, cater to a number of different system deployments, and deliver a range of features.
There are a number of different types of monitoring, each providing distinct output for different stakeholders.
- Functional – is the system performing it’s required operations without error?
- Non-functional – is the system handling the capacity and processing transactions at the rate it was designed for (performance/capacity monitoring)? Can it measure a baseline load and then report variance from this baseline, or anomalous behavior? Can it display daily, weekly or monthly variance? Can it predict when the capacity of the system is likely to be exhausted by trend analysis of the metrics over time?
- Audit – are all sensitive interactions with the system, such as changes, administrative controls, restarts etc. being captured accurately? Can this information be retained in long term storage for the period required by organisational policy or relevant legislation?
- Security – are incidents that may represent a malicious attack on the system captured and presented to the teams that require this information?
- Operational Planning – are capacity-based statistics captured that allow for capacity management and planning?
- Business Planning – is the system succeeding in it’s intended function? Are the features of the system delivering the business value that was anticipated? Is the uptake of the system meeting the business expectations? This category of monitoring can expand into the realm of analytics and big data, so when assessing the need here, look at what other analytics gathering is in place.
- Tuning – does the monitoring help advise how the system should be tuned. Does it allow profiling that can be used by application developers or system administrators to improve the performance and efficiency of the system?
- Continuous or Ad-hoc – is the monitor providing 24×7 coverage for detecting outages in a production system, or is it collecting detailed metrics during performance testing? Can one monitoring tool provide for both roles, or are different tools required?
It is worth noting that one set of monitored data may be used by different stakeholders for different purposes. Different stakeholders may have significantly different timeliness and retention requirements for the results that the monitoring solution delivers to them.
Any monitoring solution must be able to accommodate the way your systems and applications are deployed and hosted.
- Is the monitor non-intrusive? Does the overhead incurred on the monitored system still fit within the agreed response times?
- Does the monitor generate or require excessive network bandwidth to collect the metrics to perform it’s function?
- Is the monitor suitable for on-premises and cloud-based monitoring?
- Does the monitor have licensing constraints that may limit the scope or volume of the monitoring? The monitoring solution should have the flexibility to be able to collect the data it needs using a variety of topologies, allowing the monitoring solution to align with the infrastructure topology that is being monitored.
Monitoring Features and Considerations
Different monitoring solutions provide different sets of features. When considering a solution, you should identify which features you require. Some features you may wish to examine are:
- Log aggregation – allows for capture, index and search across the system logs.
- Collection of system metrics.
- Rules and Triggers for alerting – provide a facility to set conditions and threshholds based on the expected system behavior.
- Scheduled reports – query the monitored metrics and produce a report to illustrate operational performance and rates of incidents.
- Dashboards – visually display the monitored metrics, and allow filtering, timeboxing and searching.
- Customisation for business-specific alerts and reports.
- Correlations across nodes. Can the monitor intelligently recognise transactions passing across a cluster of identical nodes, and correlate requests and responses across the cluster?
- Dynamic adjustment of the level of detail monitored.
- Value of monitoring – does the monitoring system justify it’s cost by reducing or removing other costs associated with outages, problem determination, loss of business reputation etc.
- Reliability – is the monitor going to perform it’s function reliably.
- Specific or Generic – is the monitor targeted at a specific product, or is it designed to monitor a wide range of products?
- Enterprise capability – can the monitor scale and aggregate metrics and logs from a large number of systems and nodes across different data centers or cloud providers?
In order to come up with a set of selection criteria, consider the monitoring type, stakeholders, system deployment and features that are relevant to your application or business. This allows for a clear process in selecting the most appropriate monitoring system to fit your needs.
Ben Stringer is the Principal Consultant at Syntegrity Solutions, a consultancy specialising in Integration, APIs and Identity Management. He has over 25 years of experience in the IT industry, and enjoys working with customers from many different industries on their integration and API projects.