PostgreSQL Fault Tolerance

By Ronald Valente. Published

Before we get started I want to go over what it means to be highly available and fault tolerant. To have a pair of machines (the minimum!) in a cluster that provides redundancy to a service or set of services. High availability is usually accomplished by a cluster framework like the one we will be using in this post. That being said there are a lot of tutorials online that show a partial implementation of this setup. My objective here is to have a more complete resource in one place. That said, this post still only scratches the surface of what you can do with Pacemaker/CoroSync.

Objective

The objective of this post is not to go into the exhausting details of setting up Ubuntu, VMware, or PostgreSQL. While all of that is extremely important, it would make for a very long post that would be disjointed and hard to follow. Instead I have decided to focus on the fault tolerant pieces. I will put each section into context for maximum knowledge osmosis (In this case knowledge is water).

Setup a cluster of two VMware VMs running Pacemaker/CoroSync/DRBD to provide a fault tolerant PostgreSQL service.

Requirements

You should have a strong understanding of linux system administration before you go through this post. In addition to system administration you should understand the concepts and objectives of fault tolerance. There is no sense in going through all this trouble if you really don't need to. In this post I will be going over the details to build a HA Active/Passive pair of VMs running PostgreSQL.

Infrastructure

Here are the pieces of the infrastructure that were already present before I started writing this post. I will not be going into too much detail with regards to VMware vSphere 5 or Ubuntu 12.04 LTS. VMware vSphere 5 is required for fencing and must be present, there are other fencing options as well. I recommend you use a Ubuntu 12.04 LTS template, all your VMs would then be built off a single VM. The benefits here are the obvious time savings.

Software

Below are the software packages that make all the magic happen, Pacemaker/Corosync are the pieces of cluster software that really do all the heavy lifting. DRBD handles the replication between the two secondary VMDK files for the database. Last but not least PostgreSQL, our database of choice that we want to make fault tolerant.

Virtual Machines

As I said earlier, we are using VMware vSphere 5. This is a critical piece of the implementation because vCenter provides us with the fencing ability. This is extremely important in a high availability cluster. Whenever you add machines that have the ability to access the same device, things could get hairy. That said, we setup STONITH (shoot the other node in the head) fencing to provide a failsave way to take a node offline. This will ensure data integrity on your shared disk(s).

Hard Drives

You will need an OS drive and a Database drive. I will not go into details on the best settings for database drives, since this is a post on FT and not PostgreSQL itself any size will do. For this tutorial I used a 8GB OS drive and a 4GB DB drive. This is plenty to do a proof of concept.

Networking

You will want two networks for this setup, one provides the LAN access to the PostgreSQL server and the other provides a "crossover cable" between the two VMs. In my setup I have a single ESXi host so extra upstream networking or a dvSwitch is not required. If you have multiple hosts then you will want either a dvSwitch or a VLAN that traverses multiple hosts.

Once you have your virtual machines deployed we are ready to get started!

Topology

Below outlines the basic setup that we will be completing in this post.

Cluster Topology

Below is a two node active/passive cluster. As you can see there are the two nodes on the bottom, then on top of that is the cluster software (Pacemaker and Corosync). After that layer we have all the additional pieces that we add on, the shared storage (DRBD) is setup as a master/slave. The reason we run in master/slave is because we want replication to run continuously. If not then we would not have a hot standby database. Next in the stack we have the filesystem and the virtual IP, these two resources can only be on a single node at a time, more on that later. Last but certainly not least is the PostgreSQL database, this runs on top of and depends on the lower three pieces.

+-------------+
| Database    |
+-------------+------------+
| Filesystem  | IP Address |
+-------------+------------+
| DRBD Master | DRBD Slave |
+-------------+------------+
|         Pacemaker        |
+--------------------------+
|         CoroSync         |
+-------------+------------+
|   Node 1    |   Node 2   |
+-------------+------------+

Cluster Networking

Shown below is the networking used in this example. Essentially we have tow networks, one public network and one private network. The 192.168.0.0/24 is the public network that is routed. The 10.0.0.0/24 network is used as a private network used only between the cluster nodes. The virtual IP that is shared between the hosts lives on the public network.

+-------------------------------+
|     dev-pg (192.168.0.3)      |
+---------------+---------------+
|  192.168.0.1  |  192.168.0.2  |
+---------------+---------------+
|    Node 1     |    Node 2     |
+---------------+---------------+
| 10.0.0.1 | Private | 10.0.0.2 |
+----------+---------+----------+

Host Setup Introduction

During this section I will preface every command you enter in the CLI with a `$`. If there is a `hostname$` you will run that command on that host only, if there is no hostname you will run it on BOTH hosts. The output will be directly after without a `$` in the code block. See the example below.

# Run this on both hosts
$ hostname
dev-pg1

dev-pg1
# Run this only on dev-pg1
dev-pg1$ hostname
dev-pg1

---

Shared Storage (DRBD)

Device Setup

Before we dive into DRBD we need to setup our physical device which will be the replicated device on both sides. We have added a 2nd hard disk to our VMs at /dev/sdb. We will create a partition and stop there.

$ sudo fdisk /dev/sdb
> n
> p
> 1
> [Enter]
> [Enter]
> w

Once you are done writing the partition table you can move onto the next step, where we will install DRBD and get started configuring the resource!

Installation

As you know by know, we are using DRBD for our replication. This provides us with a simple shared device with replication to a secondary host. For extensive documentation on DRBD please go to their users guide.

Now it is time to get started on the host configuration, the first part is the initial setup of the DRBD shared disk. Setting up the block device now will set the stage for the replicated PostgreSQL servers.

# Install the DRBD Package
$ sudo apt-get install drbd-utils

# Load the Kernel Module
$ sudo modprobe drbd

Note: If you get an error loading the kernel module you have the `linux-virtual` kernel, you have a couple of options. You can either install the `linux-server` package which will include the full kernel or build/load the module from scratch.

Checkpoint

You should see output similar to the following when you run this command.

$ cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
srcversion: 71955441799F513ACA6DA60

Configuration

We will be using DRBD to provide replication, first step is to configure the resource file. Create a resource file on both nodes that looks like the following. Remember to replace `PRIVATE_IP` with the private, non-routable IP address that you assigned to your nodes. In our topology above it is `10.0.0.1` and `10.0.0.2` for `dev-pg1` and `dev-pg2` respectively.

# /etc/drbd.d/r0.res
resource r0 {
  device /dev/drbd0;
  disk {
    fencing resource-only;
  }
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }
  meta-disk internal;
  syncer {
    rate 40M;
  }
  on dev-pg1 {
    address PRIVATE_IP:7789;
    disk    /dev/sdb1
  }
  on dev-pg2 {
    address PRIVATE_IP:7789;
    disk    /dev/sdb1;
  }
}

Now that we have the configuration files created on both nodes we can setup the distributed disk.

# We need to create the metadata on the r0 resource
$ sudo drbdadm create-md r0
  Writing meta data...
  initializing activity log
  NOT initialized bitmap
  New drbd meta data block successfully created.
  success

# Now bring up the r0 resource
$ sudo drbdadm up r0

# Check the status
$ sudo drbd-overview
  0:r0  Connected Secondary/Secondary Inconsistent/Inconsistent C r-----

Note: Running `sudo drbdadm up` is the same as running the following three commands: `sudo drbdadm attach`,`sudo drbdadm syncer`,`sudo drbdadm connect`.

# The command below will overwrite everything on the secondary device, ensure you exercise caution if you have data on the device.
$ sudo drbdadm -- --overwrite-data-of-peer primary r0

# Check the Status
$ sudo drbd-overview
  0:r0  SyncSource Primary/Secondary UpToDate/Inconsistent C r-----
    [>...................] sync'ed:  5.9% (1977244/2096028)K

We can format the shared block device, for this setup we will be using ext4. We do not need a clustered filesystem because we are setting up an active/passive cluster. If you need an active/active setup, then you would want to look into using a cluster filesystem. Run this only on `dev-pg1`.

dev-pg1$ sudo mkfs.ext4 /dev/drbd0

Now we want to mount shared block device where PostgreSQL will use it. In order to do this we need to create the directory for the mount then open the fstab to create the mount. Do this on both nodes.

$ sudo mkdir -p /var/lib/postgresql/9.1/main
$ sudo vim /etc/fstab

Add the following line to the end of the `/etc/fstab` file on both `dev-pg1` and `dev-pg2`.

# /etc/fstab
/dev/drbd0      /var/lib/postgresql/9.1/main    ext4    noauto  0       0

To test the mount after you completed the above steps, run the follow:

dev-pg1$ sudo mount /var/lib/postgresql/9.1/main
dev-pg1$ mount
  /dev/drbd0 on /var/lib/postgresql/9.1/main type ext4 (rw)
# We need to remove the lost+found folder in order for PostgreSQL to install correctly.
dev-pg1$ sudo rmdir /var/lib/postgresql/9.1/main/lost+found

Failover Test

We have completed and mounted the DRBD shared block device on dev-pg1. Now its time to unmount the shared device from `dev-pg1` and mount it on `dev-pg2`.

# First unmount the filesystem.
dev-pg1$ sudo umount /var/lib/postgresql/9.1/main

# Check DRBD Status
dev-pg1$ sudo drbd-overview
  0:r0  Connected Primary/Secondary UpToDate/UpToDate C r-----

# Then we make DRBD become a secondary on dev-pg1
dev-pg1$ sudo drbdadm secondary all

# Check DRBD Status
dev-pg1$ sudo drbd-overview
  0:r0  Connected Secondary/Secondary UpToDate/UpToDate C r-----

# Make DRBD Primary on dev-pg2
dev-pg2$ sudo drbdadm primary all

# Check DRBD Status
dev-pg2$ sudo drbd-overview
  0:r0  Connected Primary/Secondary UpToDate/UpToDate C r-----

# Now we can mount the DRBD block device
dev-pg2$ sudo mount /var/lib/postgresql/9.1/main

# Check Mount Status
dev-pg2$ mount
  /dev/drbd0 on /var/lib/postgresql/9.1/main type ext4 (rw)

Now that we have a shared filesystem that we can failover, it is time to setup PostgreSQL. Before we get started, go ahead and fail the DRBD filesystem back over to `dev-pg1`. In case you were not paying attention, you can rerun the steps above and swap the hostnames. Now you are ready to move on to the next section.

PostgreSQL Setup

Installation

# Go ahead and install PostgreSQL on the dev-pg1 node now.
dev-pg1$ sudo apt-get install postgresql

# You can see we started using space on the DRBD disk
dev-pg1$ df -kh
  /dev/drbd0                  2.0G   96M  1.9G   5% /var/lib/postgresql/9.1/main

Now you will install PostgreSQL on `dev-pg2` and remove the data under /var/lib/postgresql/9.1/main.

dev-pg2$ sudo apt-get install postgresql
dev-pg2$ sudo /etc/init.d/postgresql stop
dev-pg2$ sudo su -
dev-pg2(root)$ cd /var/lib/postgresql/9.1/main
dev-pg2(root)$ rm -rf *
dev-pg2(root)$ exit

Failover Test (Again)

Now that we have PostgreSQL installed on both nodes we can test the failover functionality.

# Bring PostgreSQL Offline on dev-pg1
dev-pg1$ sudo /etc/init.d/postgresql stop
dev-pg1$ sudo umount /var/lib/postgresql/9.1/main
dev-pg1$ sudo drbdadm secondary all

# Bring PostgreSQL Online on dev-pg2
dev-pg2$ sudo drbdadm primary all
dev-pg2$ sudo mount /var/lib/postgresql/9.1/main
dev-pg2$ sudo /etc/init.d/postgresql start
dev-pg2$ sudo -u postgres psql
  psql (9.1.4)
  Type "help" for help.

  postgres=#

Before we finish up we want to fail the DRBD device back to dev-pg1, this time we will NOT enable the PostgreSQL server. Leave it off for now.

In order to ensure that PostgreSQL doesn't startup automatically on boot we will disable the init script.

# Stop and Disable PostgreSQL on dev-pg1 (primary)
dev-pg1$ sudo /etc/init.d/postgresql stop
dev-pg1$ sudo update-rc.d postgresql disable
dev-pg1$ sudo umount /var/lib/postgresql/9.1/main

# Stop and Disable DRBD on dev-pg2 (secondary)
dev-pg2$ sudo /etc/init.d/drbd stop
dev-pg2$ sudo update-rc.d drbd disable

# Stop and Disable DRBD on dev-pg1 (primary)
dev-pg1$ sudo /etc/init.d/drbd stop
dev-pg1$ sudo update-rc.d drbd disable

It is that simple! You now have an HA pair of PostgreSQL servers! They are not safe by any means, but you have something to work with now. What do I mean by not safe? You have no fencing, no heartbeat, nothing. This is NOT something that you would want to have in production or even development! You would be better off with a single node and having backups at this point. No go ahead and finish reading, the next section is what you came for anyway.

Take a deep breath, get a cup of coffee, and enjoy!

Cluster Setup

The cluster setup section will take you to the end of this post. The cluster post will be broken into multiple sections for each fraction that makes up the HA cluster. First we will go ver CoroSync which is the heartbeat portion of the cluster. Once we finish the CoroSync setup we will move to Pacemaker which is the cluster resource manager. Once you are done configuring Pacemaker you are all set!

Installing CoroSync is quite easy. When you install Pacemaker on Ubuntu 12.04 LTS you get all the necessary parts to setup the entire cluster.

# Install Pacemaker on both nodes
$ sudo apt-get install pacemaker

Now that Pacemaker and CoroSync are installed we can move to the configuration section.

CoroSync

There is only one simple change to make in the `/etc/corosync/corosync.conf` and that is the network setup. Search for the `interface {` configuration item and then update the bindnetaddr: field, we will be using the private network for the heartbeat.

interface {
  ringnumber: 0
  bindnetaddr: 10.0.0.0
  mcastaddr: 226.94.1.1
  mcastport: 5405
}

Next we want to update the `service {` configuration to up the version of pacemaker.

service {
   # Load the Pacemaker Cluster Resource Manager
   ver:       1
   name:      pacemaker
}

Lastly, before you leave the editor you'll need to enable quorum. Add the following to the end of the `/etc/corosync/corosync.conf` configuration file.

quorum {
       provider: corosync_votequorum
       expected_votes: 2
}

Note: It is good practice to add each node to the local `/etc/hosts` file.

Now that you have CoroSync configured we need to tell your systems to start CoroSync on startup.

$ sudo vim /etc/default/corosync
# start corosync at boot [yes|no]
START=yes

Once that is configured you can start the CoroSync service on both nodes.

$ sudo /etc/init.d/corosync start

Verify that CoroSync is running

$ ps -ef | grep corosync
root      1060     1  0 Jul26 ?        00:00:55 /usr/sbin/corosync

Once you have verified that the process is running you can test the config.

$ sudo corosync-cfgtool -s
Printing ring status.
Local node ID 16777226
RING ID 0
  id  = 10.0.0.1
  status  = ring 0 active with no faults

Also check that quorum is configured.

$ sudo corosync-quorumtool -l
Nodeid     Votes  Name
16777226     1
33554442     1

Troubleshooting

If you get output similar to the following, you will need to check your configuration or restart the CoroSync process.

$ sudo corosync-quorumtool -l
Nodeid     Name

Now that you have CoroSync online and working it is time to move to the Pacemaker portion of the cluster setup. This is where the fun begins!

Pacemaker

Installation

First things first we need to start Pacemaker.

# First we will start Pacemaker on dev-pg1
dev-pg1$ sudo /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager: [  OK  ]
dev-pg1$ sudo update-rc.d pacemaker defaults

# Once that finishes, lets start it on dev-pg2
dev-pg2$ sudo /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager: [  OK  ]
dev-pg2$ sudo update-rc.d pacemaker defaults

Perfect! Now we can ensure that both nodes are online and talking with each other by running the following:

$ sudo crm_mon -1
============
Last updated: Wed Aug  8 11:06:49 2012
Last change: Wed Aug  8 10:03:43 2012 via crmd on dev-pg1
Stack: openais
Current DC: dev-pg1 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ dev-pg1 dev-pg2 ]

On a clean install you can see the base configuration of the Pacemaker cluster.

dev-pg1$ sudo crm configure show
node dev-pg1
node dev-pg2
property $id="cib-bootstrap-options" \
  dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2"

If you care to see the xml configuration file, you can do that as well.

$ sudo crm configure show xml

During the next step, you will only run the commands on dev-pg1, this is because Pacemaker handles the sync of the configs between the nodes.

Configuration

First thing we will do is verify our configuration, memorize this command.

$ sudo crm_verify -L

You should receive output that looks similar to the following:

crm_verify[9973]: 2012/08/08_11:29:08 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[9973]: 2012/08/08_11:29:08 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[9973]: 2012/08/08_11:29:08 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
  -V may provide more details

This is because we have STONITH enabled but not configured, for now we are going to disable STONITH until we get fencing enabled.

WARNING: Running a cluster without fencing is VERY DANGEROUS, if you do this, it is worse than running a single node. Data corruption and end of the world type stuff will happen.

Disable STONITH

$ sudo crm configure property stonith-enabled=false
$ sudo crm_verify -L

Setup some good Active/Passive defaults:

dev-pg1$ sudo crm configure rsc_defaults resource-stickiness=100
dev-pg1$ sudo crm configure property no-quorum-policy=ignore

Now verify the configuration:

$ sudo crm configure show
node dev-pg1
node dev-pg2
property $id="cib-bootstrap-options" \
  dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2" \
  stonith-enabled="false" \
  last-lrm-refresh="1344447188" \
  no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
  resource-stickiness="100"

Now you should see no output when you run the verify command.

Setup the shared IP of the cluster.

sudo crm configure primitive ip_postgres ocf:heartbeat:IPaddr2 \
  params ip=192.168.0.3 cidr_netmask=28 \
  op monitor interval=30s

Now if you run `sudo crm configure show` you will get output that looks similar to the following:

sudo crm configure show
node dev-pg1
node dev-pg2
primitive ip_postgres ocf:heartbeat:IPaddr2 \
  params ip="192.168.0.3" cidr_netmask="28" \
  op monitor interval="30s"
property $id="cib-bootstrap-options" \
  dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2" \
  stonith-enabled="false" \
  no-quorum-policy="ignore" \
  last-lrm-refresh="1343213649"
rsc_defaults $id="rsc-options" \
  resource-stickiness="100"

Now that the configuration is active, we can check to make sure the IP is started on a host by running `sudo crm_mon -1`, we should see something similar to:

$ sudo crm_mon -1
============
Last updated: Wed Jul 25 07:18:34 2012
Last change: Wed Jul 25 07:17:09 2012 via cibadmin on dev-pg2
Stack: openais
Current DC: dev-pg1 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ dev-pg1 dev-pg2 ]

 ip_postgres  (ocf::heartbeat:IPaddr2): Started dev-pg1

Now that you have the IP configured and enabled, you should be able to ping your ip_postgres.

$ ping dev-pg
PING dev-pg.domain.tld (192.168.0.3) 56(84) bytes of data.
64 bytes from dev-pg.domain.tld (192.168.0.3): icmp_req=1 ttl=64 time=0.273 ms
64 bytes from dev-pg.domain.tld (192.168.0.3): icmp_req=2 ttl=64 time=0.125 ms
--- dev-pg.domain.tld ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.125/0.199/0.273/0.074 ms

In order to test failover we can simulate a failure on the node that has the IP, dev-pg1 in this case.

Initiate a ping from your local machine to the ip_postgres, dev-pg. Once that is running stop Pacemaker and CoroSync on dev-pg1.

# Stop Pacemaker
dev-pg1$ sudo /etc/init.d/pacemaker stop
  Signaling Pacemaker Cluster Manager to terminate: [  OK  ]
  Waiting for cluster services to unload:......[  OK  ]

# Stop CoroSync
dev-pg1$ sudo /etc/init.d/corosync stop
  Stopping corosync daemon corosync [ OK ]

Now that dev-pg1 is effectively down, we can verify this status.

# Verify dev-pg1 shows OFFLINE
dev-pg2$ sudo crm_mon -1
============
Last updated: Wed Aug  8 11:42:31 2012
Last change: Wed Aug  8 11:33:17 2012 via cibadmin on dev-pg1
Stack: openais
Current DC: dev-pg2 - partition WITHOUT quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ dev-pg2 ]
OFFLINE: [ dev-pg1 ]

 ip_postgres  (ocf::heartbeat:IPaddr2): Started dev-pg2

Note: Not a single ping was dropped during that sequence!

Now lets bring the dev-pg1 node back online.

# Start CoroSync
dev-pg1$ sudo /etc/init.d/corosync start
  Starting corosync daemon corosync [ OK ]

# Start Pacemaker
dev-pg1$ sudo /etc/init.d/pacemaker start
  Starting Pacemaker Cluster Manager: [  OK  ]

Verify that dev-pg1 is reporting online in the crm monitor.

dev-pg1$ sudo crm_mon -1
============
Last updated: Wed Aug  8 11:45:22 2012
Last change: Wed Aug  8 11:33:17 2012 via cibadmin on dev-pg1
Stack: openais
Current DC: dev-pg2 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ dev-pg1 dev-pg2 ]

 ip_postgres  (ocf::heartbeat:IPaddr2): Started dev-pg2

Note: The ip_postgres did NOT move back to dev-pg1, this is expected and preferred results!

To ensure healthy resources do not move when a host comes back online set the following configuration (high recommended): `sudo crm configure rsc_defaults resource-stickiness=100`.

Lets setup DRBD using pacemaker. First we will use

sudo crm configure primitive drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op start timeout="240s" \
op stop timeout="100s" \
op monitor interval="29s" role="Master" timeout="10s" \
op monitor interval="31s" role="Slave" timeout="20s"

Now we need to create the master/slave relationship with the drbd device, this is done by creating an ms resource.

sudo crm configure ms ms_drbd drbd \
  meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

Verify that we have our DRBD block device setup and working.

============
Last updated: Wed Aug  8 13:52:02 2012
Last change: Wed Aug  8 13:51:57 2012 via cibadmin on dev-pg1
Stack: openais
Current DC: dev-pg2 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ dev-pg1 dev-pg2 ]

 ip_postgres  (ocf::heartbeat:IPaddr2): Started dev-pg1
 Master/Slave Set: ms_drbd [drbd]
   Masters: [ dev-pg2 ]
   Slaves: [ dev-pg1 ]

Once this is verified, it is now time to add a filesystem to pacemaker and specify the block device that corresponds with it as well as the filesystem.

sudo crm configure primitive fs_postgres ocf:heartbeat:Filesystem \
  params device="/dev/drbd0" \
  directory="/var/lib/postgresql/9.1/main" fstype="ext4" \
  op start interval="0" timeout="60s" \
  op stop interval="0" timeout="60s"

We want to ensure that the filesystem is located where the master drbd node is.

sudo crm configure colocation fs_on_drbd inf: fs_postgres ms_drbd:Master

Now run `crm_mon -1` to see what things are shaping up as. If you see any `Failed actinos:` at the bottom similar to:

Failed actions:
drbd_monitor_0 (node=dev-pg2, call=23, rc=6, status=complete): not configured

Then run the following command:

$ sudo crm_resource -P

Configure PostgreSQL for Clustering

Tell PostgreSQL to listen on all interfaces by making the following change to your `postgresql.conf`.

# /etc/postgresql/9.1/main/postgresql.conf
listen_addresses = '*'

Now we will run the following to create the postgresql resource. On Ubuntu we need to change the location of some of our items as they are not in the default PostgreSQL locations.

sudo crm configure primitive postgresql ocf:heartbeat:pgsql \
  params config="/etc/postgresql/9.1/main/postgresql.conf " \
  params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" \
  params pgdata="/var/lib/postgresql/9.1/main" \
  op start interval="0" timeout="120s" \
  op stop interval="0" timeout="120s"

Now that we have created all of our resources we can group them all together in a resource named "database". This will allow us to start and migrate all the resources with a single command.

sudo crm configure group database fs_postgres ip_postgres postgresql

Now we need to ensure that the database filesystem is always in the same location as the database, we do not want to split this up or else the database would not run.

sudo crm configure colocation postgresql_on_drbd inf: database ms_drbd:Master

Last but definitely not least, we need to ensure we dont start the database until the master/slave drbd device is online and ready.

sudo crm configure order postgresql_after_drbd inf: ms_drbd:promote database:start

STONITH

Stonith is arguably the most important piece of HA clusters. This ensures that we handle unsavory cluster conditions by aggressively killing off machines that think they are primary nodes when they are not. This is easier when you have hardware servers, you can do something simple like an IPMI call or cut the power on a PDU. In this case we need to use the VMware vCenter API to kill the VM in question.

Installation

First we need to install the vSphere Perl SDK, start by downloading it from VMware's website. Once that is done, go ahead and install the prerequisites for this sdk.

sudo apt-get install libarchive-zip-perl libcrypt-ssleay-perl \
libclass-methodmaker-perl libuuid-perl libsoap-lite-perl \
libxml-libxml-perl libdata-dump-perl perl-doc libssl-dev

After the installation is done, download the VMware CLI from vmware.com to your local computer then upload it to both nodes.

scp ~/Downloads/VMware-vSphere-CLI-5.0.0-615831.x86_64.tar.gz administrator@dev-pg1:
scp ~/Downloads/VMware-vSphere-CLI-5.0.0-615831.x86_64.tar.gz administrator@dev-pg2:

Now that its complete we can install the SDK.

tar -xzf VMware-vSphere-CLI-5.0.0-615831.x86_64.tar.gz
cd vmware-vsphere-cli-distrib
sudo ./vmware-install.pl

Now that we have installed the vSphere CLI (which comes with the Perl SDK, we can store your vCenter credentials!

/usr/lib/vmware-vcli/apps/general/credstore_admin.pl add -s dev-vc.domain.tld -u stonith -p 'ZSE$xdr5'

After you have created your credential file you need to copy it to a shared location:

sudo cp -p $HOME/.vmware/credstore/vicredentials.xml /etc

Add the following line to the `/usr/lib/stonith/plugins/external/vcenter` file after `use warnings`.

$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;

Test out the current connection, on both nodes.

stonith -t external/vcenter VI_SERVER="dev-vc.domain.tld" \
VI_CREDSTORE=/etc/vicredentials.xml \
HOSTLIST="dev-pg1=dev-pg1;dev-pg2=dev-pg2" \
RESETPOWERON=0 -lS

You should see output that looks like the following:

info: external/vcenter device OK.
dev-pg1
dev-pg2

Configuration

Now that we have setup and tested vCenter fencing it is time to configure the fencing setup on Pacemaker permanently. Run the following configurations.

sudo crm configure primitive vfencing stonith::external/vcenter params \
VI_SERVER="dev-vc.domain.tld" VI_CREDSTORE="/etc/vicredentials.xml" \
HOSTLIST="dev-pg1=dev-pg1;dev-pg2=dev-pg2" RESETPOWERON="0" \
op monitor interval="60s"
sudo crm configure clone Fencing vfencing

Now that you have your configuration set and fencing ready to go, you can now enable stonith fencing by entering the following command:

sudo crm configure property stonith-enabled="true"

Make sure everything in your config checks out:

sudo crm_verify -L

Now that you have fencing all setup, it is time to actually test the fencing procedure. At this point there should be no important data that is on this setup that is not backed up!

Before you test fencing you will want to put a node in standby.

$ sudo crm standby dev-pg1

You have a few scenarios to test: they are "reset", "off", and "on".

stonith -t external/vcenter VI_SERVER="dev-vc.domain.tld" \
VI_CREDSTORE="/etc/vicredentials.xml" \
HOSTLIST="dev-pg1=dev-pg1;dev-pg2=dev-pg2" \
RESETPOWERON=0 -T reset dev-pg1

After you run that command you should see the action being performed at the VC level. Remember this is not a graceful reset or poweroff, use this with caution! You will see this output from that command:

external/vcenter[26621]: info: Machine tst-esx-03.domain.tld:dev-pg1 has been reset

Connectivity Tests

Now that we have a working and fenced cluster, we want to ensure that the nodes that our system is running on have connectivity to our gateway. This is very useful if you have HSRP or some failover gateway configured by your network team. If there are issues with connectivity, we wont leave our services hanging on a server that is not reachable.

First off we want to create the ping resource, this will ping the gateway from each node.

sudo crm configure primitive ping ocf:pacemaker:ping \
params host_list="192.168.0.1" \
op start interval="0" timeout="60s"

Now we will clone the resource so that it is running on both nodes at the same time.

sudo crm configure clone Connected ping meta interleave="true"

Finally, we need to ensure that we define a location so this has an effect on our cluster. We will start with a simple rule.

sudo crm configure location Connectivity database \
rule pingd: defined pingd

Testing

sudo crm_attribute -G -t status -N dev-pg1,dev-pg2 -n pingd

Resources