Effective MQ Auditing
IBM (Websphere) MQ is an established and trusted messaging solution used as a core component in many deployments worldwide. Whilst it has a rich history the product continues to evolve such that it can be deployed in the latest cloud and container deployments. We would like to pay tribute to the team for continuing their excellent work.
In this blog we will cover how to effectively audit your trusted MQ deployments wherever it is deployed, on-premise or in the cloud. It has been produced in conjunction with our colleagues at Syntegrity Solutions (specifically Neil Casey). We will describe what facilities are provided by MQ to enable auditing and show how these services can be harnessed using Square Bubble and Splunk.
First let us consider why this is important. MQ is a core component in numerous critical systems from financial transactions to publishing of sensitive data. It is therefore imperative that the deployment is audited to ensure adequate steps are taken to both manage the environment and be able to detect and validate changes made. This is important from a generic ITIL perspective but can also come under more specific regulations such as PCI/DSS, HIPAA, GDPR or other privacy laws in various jurisdictions.
There are 2 main themes to auditing which we will discuss here, they are concerned with:
- Detecting configuration changes, and
- Policy based config audit which covers more generic patterns to see if config varies from a specific gold standard or defined policy
Detecting Config Changes
MQ readily provides config changes in configuration events (more details from the MQ docs)) published to the SYSTEM.ADMIN.CONFIG.EVENT queue. These events cover objects being created, changed and deleted. However, you have to switch on this capability and have a process running to consume these messages. The messages are also encoded in PCF, an IBM proprietary format. The messages provide details of the config event as well as user and application used to make the change.
For an object change, let us use a trivial example where a queue has its max depth set to 2500, MQ sends 2 messages:
- The object before the change, and
- The object after the change
The reader would then have to process the PCF config event message to determine the state of the attributes both before and after the change in order to see what actual change took place. A bit convoluted but an acceptable approach nonetheless.
There is a further command event that can also be captured to determine what command was used to effect the change. Command events are published to the SYSTEM.ADMIN.COMMAND.EVENT queue. Again, this message is provided in PCF format and a process is required to read this queue. The command, including all parameters, is encoded and needs to be reconstructed, using the applicable lookups. If the command was delivered from runmqsc, it will appear different to the enterred command. However, there is a regular mapping between the two schemes. So the syntax may vary but the more important semantics can be readily determined.
Lets see how this example may be presented:
MQ Event Explorer
The figure above shows both the command and config events in the Square Bubble MQ Event Explorer in Splunk. The messages from top to bottom are in reverse chronological order (older messages are lower). We can see the command event to change the max depth for the named queue being submitted on the fourth row. The reason field is set to “COMMAND MQSC” so the command is from a runmqsc shell, however, the command is displayed in the PCF lexicon. What follows the command event are 2 config events representing the before and after state of the queue object, in this case DEV.QUEUE.1, as defined in the object name field. In between these events, is a generated event (generated by Square Bubble) which shows the actual change made. This event shows the max depth was changed from 3500 to 2500. Having the change summary event available makes it much easier for the reader to see what change has been made.
The user identifier and application name is also provided so an auditor has a complete picture of who changed what and when. Be aware that the reliability of the “who” part of this information is dependant on how MQ itself is configured, and on security policy and enforcement at the organisation. It is common practice at many sites for the mqm account to be used to configure MQ objects. At some sites, this may even be mandated. This leads to all configuration appearing to be from the same (generic) account. Careful design and enforcement of MQ administrative access must be used to ensure that audited changes can be tied to the actual person who performed the change. That enforcement is not however the subject of this post.
Now that the “who” has been determined, that leaves the “why” and the question of whether this is an acceptable change. Splunk allows users to perform analytical queries to help pinpoint events that are either expected or not expected. Let us assume that the organisation in question has a policy that says max depth for non-critical queues (for which there is a list) can be set to lower than 3000, however, critical queues must be set to a minimum of 3500. If this queue is in the non-critical list, all is good, the change is acceptable within the policy. If, however, the queue is a critical queue (in which case calling it DEV.* should be called into question), the change is not acceptable and splunk can raise an alert and/or invoke a process to reset the max depth to the acceptable level.
Policy based config audit
Often policies need to be updated to suit evolving environmental needs, such as the recent scramble to replace all uses of SHA1 with SHA2, as used in X509 certificates. Being able to codify such a check and run this across the suite of Queue Managers in an analytic tool such as Splunk will ensure the policy is both checked and is fit for purpose.
MQ has a further capability to help here. There is an option to report on object configurations using the MQSC REFRESH QMGR command. This causes the current state of selected objects to be sent to the SYSTEM.ADMIN.CONFIG.EVENT queue. The messages are once again in PCF format and so need to be decoded and forwarded to an analytics tool to be validated in accordance with the policy.
So let us walk through a real world example. We shall assume we have a maintained list of supported ciphers for our MQ channels stored in a database. We can schedule jobs to run the REFRESH QMGR command for all channels on all queue managers. A process (such as the Square Bubble agent) can receive the config events and store them in Splunk. Splunk can then check for each channel that where an SSL cipher is, or should be, defined, it is on the acceptable list. If it is not then an alert can be raised. I’m not brave enough to suggest we invoke processes to set these to an acceptable default, but it could in theory be done.
The benefit of this approach is that there is a single source of policy which is validated in a central point using the actual configuration details.
- MQ can provide configuration details and this can be invoked on scheduled basis (there are many approaches to do this),
- Square Bubble can take those events and convert them from the proprietary PCF format and deliver them to Splunk, and
- Splunk can enable the validation checks to be made and raise alerts as appropriate