MQ Cluster Workload Balancing with Preferred Locality - Part 2
Last week we looked at separating the traffic into geographic regions, but there was no ability to keep providing service if the data centre for a region was down or disconnected from the network.
This week we look at the first of 3 ways to enable a surviving site to take on messages from another geography in the event of an outage.
Enabling continued processing in a disaster
The company would also like to maximise the availability of the service, and ensure that if all instances of the service queue in either location are unavailable, the messages will be sent to the surviving site, despite the higher cost and slower response time. Messages should only be sent across to the distant data centre if the local data centre cannot be accessed.
This second requirement is not handled by the simple configuration of 2 separate clusters. If all 3 queue managers hosting Q1 in the EUROPE cluster fail (perhaps the communications link to the Paris Data Centre is cut), the messages from the LONDON1 queue manager cannot be processed, because there is nowhere to send them.
There are several approaches that could be used to address the two requirements together.
- Join the 2 clusters with a gateway queue manager
- Join all queue managers to both clusters
- Merge the clusters into a single cluster
Any of these approaches could lead to messages from any queue manager being distributed to any instance of the queue, which is not desired. MQ provides several attributes for queues and channels which allow the default load balancing to be tweaked to meet the company's needs.
These attributes are discussed in the manuals. See this.
Join the 2 clusters with a gateway queue manager
This is the most complex and least resilient of the options, but provides good separation of responsibilities for administration of the separate clusters.
The Cluster Workload Priority (CLWLPRTY) of each cluster receiver channel in both clusters is increased. They should all be the same value, and greater than zero (let's say 5).
The CLWLPRTY of each instance of a real local queue called Q1 is also increased. Every instance again has the same value (say 5).
A new queue manager is created which joins both the USA and EUROPE clusters. We'll call this queue manager BRIDGE1.
The CLWLPRTY of the cluster receiver channels for BRIDGE1 is less than the CLWLPRTY of the previous queue managers. It can be 0.
Define a QAlias definition on BRIDGE1, called Q1 (matching the existing queue). The target of the alias is also Q1. The alias is visible in both the USA and EUROPE clusters. This is achieved by creating a NAMELIST, setting both cluster names as entries in the namelist, and putting the namelist name in the CLUSNL attribute of the Q1 QAlias. The CLWLPRTY of the Q1 QAlias is set to 0 (below the CLWLPRTY of local queues called Q1).
Now each queue manager that puts a message to Q1 sees all instances in its own cluster, including the QAlias on BRIDGE1.
The priority of the channels to BRIDGE1 is lower than the channels to the local queue managers, so it will only be used if none of them are available.
This configuration is shown in Figure 3 USA and Europe clusters bridged.
Let's assume a message is to be sent from LONDON1. The Data Centre in Paris is unavailable, so the channels to PARIS1, PARIS2, and PARIS3 are all in RETRYING. The message is sent to BRIDGE1 instead.
When it arrives at BRIDGE1, the message cannot be put locally, because the definition is an alias. The alias resolves to a cluster queue also called Q1. BRIDGE1 queue manager knows about 6 other instances of Q1. Three in the USA cluster and three in the EUROPE cluster. The EUROPE cluster queue managers are all unavailable, so the message will be sent to one of the destinations in the USA cluster.
This achieves the goal of getting the message to a location that can process it, but now we need to be able to send the response back.
To do this, a queue manager alias for each queue manager in the EUROPE cluster must be defined on the BRIDGE1 queue manager, and exposed to the USA cluster. Now when the queue manager in the USA cluster tries to send a response message back to LONDON1, it discovers the alias visible in the USA cluster on BRIDGE1, and sends it there. BRIDGE1 resolves the actual LONDON1 queue manager, so the message is forwarded to the correct final destination.
And to make it all work in the opposite directly, the queue managers in the USA cluster each need an alias definition on BRIDGE1 exposed to the EUROPE cluster too.
More than one bridge queue manager can be built, which would provide a resilient service bridging the two clusters.
There is a lot of definition work going on, and messages that travel across to the surviving locality are delayed even further by having to pass through the BRIDGE1 queue manager.
Advantages of this approach:
- Administration of the clusters is kept separate
- Messages pass through a bridge queue manager
- Extra definitions on the bridge queue manager
- Uses cluster name lists to make the bridge queues visible in both clusters
- Needs extra queue manager(s) providing bridge functions
Next week… option 2, overlap the 2 clusters.
Love this story? Subscribe to the Syntegrity Solutions newsletter here and get them delivered to your inbox every month.
Neil Casey is a Senior Integration Consultant with Syntegrity. He has a solid background in enterprise computing and integration. He focuses on finding technically elegant solutions to the most challenging integration problems.