DRBD + Heartbeat based Active/Passive High Availability Solution
We offer 99.5% uptime guaranty with credits clause against openly published SLA. We have busy Groupware Solution servers which have been running for 300+ odd days without reboot. However, if a dedicated client needs a 100% network uptime with no scope for any downtime, which normally occurs on account of Hardware failure and replacement, then we offer Active/Passive failover setup on Groupware server only for single tenant client server setup. This is based on DRBD and Heartbeat. The setup is as shown in figure below:
Typical DRBD + Heartbeat Active/Passive Failover Setup.
In the solution examined, DRBD can be viewed as a resource that is controlled by Heartbeat. This ‘node management’ layer is transparent from GROUPWARE Server software point of view. DRBD uses the underlying physical disk as a virtual device. This virtual device is essentially a DRBD resource that is used by GROUPWARE Server software to access the disk. When the DRBD resource is made ‘active’ on a particular node, it means that the disk configured as a DRBD resource is now accessible and ready for I/O operations.
The main advantages of using GROUPWARE Server software instead of another
mailserver solution for this setup would stand in its proprietary UltraStorage™
architecture, with powerful features such as: keeping just a ‘single-copy’ of
the email message and virtually linking it to multiple recipients, the internal
caching layer which speeds up the access speed to the actual email from the
message store, error checking and recovery logic which can handle unexpected
hardware crashes that otherwise would have led to subsequent consistency losses,
just to sum up some of the technologies used.
Starting with the current version of DRBD, a disk can be mounted only from the primary node; mounting concurrently in read-only mode from the secondary node is not allowed. This is a limitation by design. If more than one node is concurrently modifying the distributed devices, the process of deciding which part of the device is up-to-date and on which node, or what blocks of the device need to be resynchronized and in which direction, becomes very complex. If the purpose is to allow access to the data from multiple nodes concurrently, one should consider using a shared file system instead.
Due to the dual nature of the high-availability model being used, the ‘split-brain’ case must taken in consideration. STONITH is a technique for node fencing, where the errant node which might have run amok with cluster resources is simply ‘shot in the head’. This is actually where the STONITH acronym was derived from, as it stands for ‘Shoot The Other Node In The Head’. Normally, when a high-availability system declares a node as dead, it is merely speculating that it is dead. STONITH takes that speculation and makes it reality.
GROUPWARE Server software ensures that this ‘neutralization’ of the supposedly ‘dead’ node is done without notable repercussions, by delivering a proper shutdown of all the messaging resources, be it mails from the queue or message store related mechanisms.
DRBD essentially provides disk replication across a network. Therefore, its performance largely depends on the I/O bandwidth of the physical hard drives that are used and the network bandwidth between the primary and secondary nodes. The main concerns in such a setup are performance and availability (failover), both of them being guided by the disk and network I/O throughput. GROUPWARE Server software makes the solution less depending on the aforementioned limitations due to its proprietary message storage which has cached reads and writes.