Setting up Database High Availability (Postgres Failover, Replication, and Management)

Cleo Clarify Server Cluster uses Postgres as the internal database. High Availability refers to a failure- resistant system with a higher than usual amount of uptime. A second database that is continuously replicating the changes made to the master database is added to a Clarify Server Cluster. This second or “standby” database is then capable of being used by the cluster, with minimal interruption, in the event of the “master” database going offline. This dramatically increases the durability of Clarify by removing the cluster’s remaining single point of failure.

As mentioned above, a second, standby database is created and manually configured to replicate all changes to the master database. The master database streams all updates to the standby; the replication process should never be more than one second behind the master.

Clarify continually checks the health of the master database, and if determining that the master has become unreachable for a certain (configurable) amount of time, will begin a failover process that creates special logging and directs the standby database to promote itself to master. Internal database connections will be passed on to the new master/old standby. The Server Cluster continues operating with little to no interruption.

When failover occurs, an attempt will be made to set any processes in ‘mid-flight’ to a failed status. The failover process creates special log files so that the user can reconcile any error messages with the logs in the Auditor.

Manual intervention is then required to either reconfigure the old master to replicate once it is brought back online, or to set up a new standby database. See How to Reset database replication after failover.

Requirements for failover

  • In order for the automatic failover process to work, Clarify must be configured with two database hosts; one must be the master and the other a replica (standby).
  • Both databases need to be running when Clarify Server Cluster starts, otherwise Clarify won’t know there is a standby available.