Effective Datapower Governance (Auditing)
In this blog I will cover how to effectively audit your trusted IBM DataPower deployments whether it is a standalone device, on-premise, or deployed in a container, e.g. as part of the IBM API Connect solution, in the cloud. We will describe what facilities are provided by Datapower to facilitate auditing and show how these services can be harnessed using Square Bubble and Splunk.
First of all, let us consider why this important. DataPower is a core component in numerous critical systems which perform operations varying from financial transactions to publishing of sensitive data. It is therefore imperative that the deployment is audited to ensure adequate steps are taken to both manage the environment and be able to detect and validate changes made. This is important from a generic ITIL (change and incident management) perspective but can also come under more specific regulations such as PCI/DSS, HIPAA, GDPR or other privacy laws in various jurisdictions.
There are 2 main themes to auditing which will discussed here, they are concerned with:
Detecting configuration changes, and
Policy based config audit which covers more generic patterns to see if configurations varies from a specific gold standard
Detecting Config Changes
When detecting config changes, an auditor will typically want to know what was changed, when it was changed, who made the change and why the change was made. The latter is the hardest for us to achieve as authorisation to make a change is typically out of scope but that may not always be the case. Incidents and change requests can also be included within Splunk, such as Episode Review in Splunk ITSI. Linking a change to an incident or change request would then be the next challenge. I will focus on the what, when & who aspects first…
DataPower has extensive logging which can allow you to see configuration changes that are made. The downside to this approach is that these context specific log messages are scattered through the many available categories. Depending on your needs, this could result in a large number of event subscriptions being added to a log target. Whilst this may be possible the next issue is that you will also receive a lot of messages that you may not be interested in, so you will require further work to reduce the subscriptions to just what you need. In our experience, we have not come across a customer who has done this, or has even attempted to do this.
Our solution gathers the configuration objects on a scheduled basis (typically once every 24 hours during a quiet period, such as overnight). This allows us to compare objects over time. We can thus determine what has changed (or added or deleted) but we need an additional log to see when (as we can only tell you when a change was detected rather than made). The additional log is a log target that captures CLI, audit, Auth and RBM logs to provide greater detail of what management activities and logins have taken place.
Lets see how this example may be presented for a trivial comments change on a log target:
Figure 1: Square Bubble DataPower Config in Splunk
Figure 1 shows that the log target object called sqrbbl-syslog has been changed during the period observed. The distinct count being 2 means only 1 change (potentially a set of changes) was made. To detect changes we generate a hash for each object as it is being processed. This simplifies the process as we only consider the hash field for coarse grained change detection as presented in Figure 1. If we drill down on this we get the following details:
Figure 2: Square Bubble DataPower Config Explorer in Splunk
The UserSummary field has been changed. At this point we can see the values it has been assigned over the time period. Below the table (omitted for brevity) is displayed the full list of config objects that have been observed and in some cases changed. This allows us to determine which is the last observed state. We can go further, we are also ingesting the additional log to allow free searching and we can find the log entries from the CLI that include the value "admin 2", from this we get the transaction ID and search all syslog messages with that transaction ID. This provides the following events (cut down for simplicity):
Figure 3: Free form syslog search in Splunk
The when can now be determined from the syslog event (not displayed to enhance readability but trust me it is there). The who (in the case the user) can be determined from the syslog event and is admin. The IP address of where this was set is also logged as 172.18.0.1.
Another challenge here is the variations in terminology within DataPower. The web gui presents this field as "Comments" for a “Log Target”, the CLI defines this as “summary” for a “logging target” (as above) and the config object describes this field as “UserSummary” of type “LogTarget” (as in Figure 2). The only item that is consistent is the name field, so make sure you name your DataPower objects wisely to avoid confusion. This is another rule you could enforce using the techniques described herein. It would also be really useful if IBM would produce a mapping for us i.e. CLI ⇔ Web Gui ⇔ REST Config objects. Splunk could then use lookups to allow us to arrive at a consistent set of terms.
Now that the who and where from have been determined, that leaves the why and the question of whether this is an acceptable change. Splunk allows users to perform analytical queries to help pinpoint events that are either expected or not expected. Let us assume that the organisation in question has a policy that says all changes on operational systems must only occur during defined maintenance period. The change in question can be assessed and an alert raised should the change take place outside of the acceptable periods.
The limitation to the detection is that the frequency of harvesting the config could be known by the administrators or others who have access. Changes could be made and then undone so they go undetected. This is another reason to capture the additional log. This log should be used to capture logins (including failed logins) as well as all user actions. Auditing usually also involves searching for very specific events (which I won’t list here). The additional log can be used to find specific events as well as detecting logins and attempts to change config.
Comparing types and relationships
As an aside, now that we are harvesting DataPower configuration objects we can also use the Square Bubble DataPower Config Explorer to explore what range of values exists for objects of the same type, across domains and devices. My favourite is to compare all objects that include a Ciphers field. I would expect them to vary between client and server profiles but not to vary for all server profiles, and similarly for all client profiles. Such variances could exist but should be investigated to ensure policies are being maintained.
In addition, we harvest the links between config objects. Links are presented in a way that we can determine the device, domain, type and name of the object being referenced. This allows to check the types of objects being used within the context of others. For example, we can validate that all gateways use an available HTTPS Handler and alert if a HTTP Handler is used.
Similarly if we detect a change (such as removal of support for the HTTP GET method) in a HTTPS Handler we can quickly work out which services are affected using a simple splunk query. Beyond the listen address and port, this information is not readily available in the DataPower Web GUI.
Policy based config audit
Often policies need to be updated to suit evolving environmental needs, such as the scramble to replace all uses of SHA1 with SHA2, as used in X509 certificates. Being able to codify such a check and run this across the suite of DataPower instances in an analytic tool such as Splunk will ensure the policy is both checked and is fit for purpose.
So let us walk through a real world example. We shall assume we have a maintained list of supported ciphers for our secure connections with DataPower stored in a database. We can schedule jobs to capture the appropriate configuration data. A process (such as the Square Bubble agent) can perform this task and store the data in Splunk. Splunk can then check that each cipher is in the list of supported ciphers in a scheduled report. The same search could also be used to raise an alert. I’m not brave enough to suggest we invoke processes to set these to an acceptable default, but it could in theory be done.
The benefit of this approach is that there is a single source of policy which is validated in a central point using the actual configuration details. In summary:
DataPower can provide configuration details and this can be invoked on scheduled basis (there are many approaches to do this),
Square Bubble can take this data and deliver it to Splunk with a hash to simplify the detection of changes, and
Splunk can enable the validation checks to be made and raise alerts as appropriate. In addition splunk can be used to query the relationships between objects to allow you to see what changes may impact on critical services.