High Availability Using The Check_MK Appliance

Yet another fantastic feature that you receive with Check_MK Enterprise is the ability to easily create a high availability system using the Appliance.

The Appliance is available in two flavors. As a stand-alone rack-mountable unit, or as a virtual machine image that you may deploy on your own platform such as ESXi.

Out of the box, we have the functionality to configure the Appliance in an Active/Passive fail-over configuration.
This configuration designates one system as a primary, and the other as a passive-replica. In the event of a failure, the roles will seamlessly be reversed and you will maintain complete functionality.

In this article we will cover the basic configuration to create a highly available site.
To begin you will need two running instances of the Check_MK Appliance with two network interfaces.

Once the nodes are online, open a web browser and navigate to one of the addresses.

We will first need to configure our secondary interfaces.
From the main menu, navigate to “Device Settings > Networking Settings” and select the “Advanced Mode” button.

The cluster requires two disparate subnets to function. For our example, we will configure our secondary interface with an address outside of our normal range and use it for syncing data between the two.

This interface must be configured on both systems.

In real world configurations, there are many ways one could handle these interfaces.
For example, by having multiple VLANs or by direct connection of the nodes to one another.

Once the secondary interface is configured, we can now proceed to build the cluster.

From the main menu, navigate to “Clustering”.

We are now setting the configuration options to connect to the second node.

1) The data sync interface is the network on which we want to perform our real-time replication.

2) A minimum of two cluster communication interfaces must be set. In the above example, we use eth0 and eth1.

3) The cluster IP address is a virtual address that is assigned to whichever node is acting as the primary.
This is the address that you will use to access Check_MK and should be reflected by DNS records if applicable.

4) Ping targets are ICMP reachable hosts on the network that help determine whether or not one node of the cluster has a broken connection.

Once “Save” is selected, you will be prompted for the password of the remote device and receive a warning stating that the remote device will be reset.

After selecting yes, a message will appear with a link to the “Cluster Status” page. Here you can view the health and synchronization status of the cluster. The initial synchronization may take up to an hour.

Links

http://mathias-kettner.com/check_mk_monitoring_appliance.html

http://mathias-kettner.com/cms_cma_cluster.html