prongs.org - PostgreSQL Streaming Replication

PostgreSQL Streaming Replication

Afra Ahmad, July 08 2014

This outline on how to set up streaming replication within PostgreSQL is meant for versions 9.3 and 9.4. Also covered here is WAL archiving which should also be setup in conjuction with replication.

Getting Started

The setup and installation of PostgreSQL must have been already done on both the master and replica machines. In addition, master's postgres user must be able to SCP directly to the replica via SSH keys. This guide explains how to set up SSH keys between two hosts.

Settng up Streaming Replication with WAL Archiving

PostgreSQL tracks changes made to the database data files via a log called the write ahead log (WAL). The log exists mainly in case there is a system crash, so the database can be restored to a correct state by “replaying” the log entries made since the last checkpoint.

This method of relaying changes gives PostgreSQL users yet another option for backing up a database, in that one can combine the regular file system level backups with the backup of of the WAL files.

To enable the archiving of the WAL files onto another machines, whether it’s a replica database, an NFS mount or other media, PostgreSQL does not make any assumptions on how it is done and gives the flexibility for the administrator to decide.

In our example, we will be shipping the logs to the replica machine, pg_replica. On the master server, pg_master, we will have to activate the archive mode. In conjuction with this method, we will also set up PostgreSQL's internal streaming replication service. Within master’s postgresql.conf file, we can begin to change some of the defaults:

wal_level = hot_standby
archive_mode = on
archive_command = ‘/var/lib/pgsql/data/ship_logs.sh %p %f  < /dev/null
max_wal_senders = 3
checkpoint_segments = 8 
wal_keep_segments = 8

(Here, I am assuming $PG_DATA is /var/lib/pgsql/data/ - you can change it if not ).

Here is the corresponding ship_logs.sh shell script:

#!/bin/sh
#
## This is called to back up the WAL files to the replica. 
## This is on top of replication and is used as another 
## method to secure data successfully transferring from one 
## database server to another.

ARCHIVE_DIR_ON_REPLICA ="/var/lib/pgsql/walfiles"
LOG=1
LOG_FILE="/tmp/postgres_wal_archiving.log"

log() { echo "`date --rfc-3339=ns` $1" >> "$LOG_FILE"; }
log_error() { echo "`date --rfc-3339=ns` $1" >> "$LOG_FILE"; exit 1; }

wal_path="$1"
wal_file="$2"
backup_server="db02.example.host"

if [ $LOG -eq 1 ]
then
   log "Transfering file to backup server, filename: $wal_file"
fi

rsync -e "ssh -p 26" "$wal_path" "$backup_server:$ARCHIVE_DIR_ON_REPLICA"

if [ $LOG -eq 1 ]
then
   if [ "$?" -eq 0 ]; then
     log "Transfer to replica server completed"
   else
     log_error "Sending $wal_file failed."
   fi
fi

Firstly, the directory mentioned above, ARCHIVE_DIR_ON_REPLICA, can be a network mount or a local partition to the replica. It must first be created on the replica:

$ mkdir /var/lib/pgsql/walfiles
$ chown postgres.postgres /var/lib/pgsql/walfiles
$ chmod 700 /var/lib/pgsql/walfiles

Also, in the script above, you'll notice the LOG paramter. This is recommended to use at first to see if all is working well. It essentially logs all actions to LOG_FILE; if set to 0, it disables this. Once the script has been saved on the master, from the shell, make it executable:

$ chmod +x /var/lib/pgsql/data/ship_logs.sh

This script needs to be run once as the postgres user to confirm the files can be copied over. One will have to accept the key once they run it. If all runs well, then this process is set up.

We are now ready to create a PostgreSQL user with replication privilege on the master:

$ sudo -u postgres psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'some_password';"

Create the corresponding entry on the master's pg_hba.conf:

host replication     replicator      10.0.0.2/32            md5

Here, 10.0.0.2 is the IP address of the replica. One can also specify hostssl for connecting over the SSL protocol, but this is beyond the scope of this document.

After a restart of the PostgreSQL engine on the master server, PostgreSQL will start the continuous archiving of logs and moving them to the archive directory /var/lib/pgsql/walfiles over on the replica server. This will occur when the logs reach the check point size which is usually at 16MB (this setting can be tweaked on postgresql.conf on master but the defaults should be fine). At this point, PostgreSQL on master can be restarted.

Upon the restart, if you kept the LOG=1 variable on in the ship_logs.sh script, you should see this in /tmp/postgres_wal_archiving.log, and confirm the files are over on the replica:

2014-07-02 Transfering file to backup server, filename: 000000010000000000000001
2014-07-02 Transfer to replica server completed
2014-07-02 Transfering file to backup server, filename: 000000010000000000000002
2014-07-02 Transfer to replica server completed

Once you see that, it's probably okay to change the LOG value from a 1 to a 0.

Now, onto the replica...

Prior to finishing the setup on the replica, we need to copy the data directory from master to catch the replica up, and then start up replication. We will need pg_basebackup to help us copy the data directory from master:

$ sudo service postgresql stop

# This will remove the data directory on the replica - careful!
$ sudo rm -rf /var/lib/pgsql/data
$ sudo mkdir /var/lib/pgsql/data
$ sudo chown postgres.postgres /var/lib/pgsql/data
$ sudo chmod 700 /var/lib/pgsql/data

# 10.0.0.1 is the master
$ sudo -u postgres pg_basebackup -h 10.0.0.1 -D /var/lib/postgresql/data -U replicator -v -P

PostgreSQL will show the status of the base backup from master as it is going. When completed, you should see the output as follows:

36924/36924 kB (100%), 1/1 tablespace
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
pg_basebackup: base backup completed

One last step is to create a recovery.conf file which will essentially turn on replication on the replica. This file, recovery.conf, should be within /var/lib/pgsql/data on the replica:

# To be put on the replica postgres servers. 
# This is needed to switch over to the master if there is a /tmp/trigger file.
standby_mode          = 'on'
primary_conninfo      = 'host=10.0.0.1 port=5432 user=replicator password=some_password'
trigger_file = '/tmp/postgresql.trigger'
archive_cleanup_command = '/path/to/bin/pg_archivecleanup /var/lib/pgsql/walfiles %r'
restore_command='cp /var/lib/pgsql/walfiles/%f "%p"'

pg_archivecleanup does what it says it does - cleans up the archive files from the location we dump them to when they are not needed. This keeps the disk space available. pg_archivecleanup is available from the PostgreSQL contrib directory from the source or from Debian it is the package postgresql-contrib-9.3. It will have to be installed otherwise /var/lib/pgsql/walfiles will keep in growing in size. One can append sslmode=require to the primary_conninfo string in case the communication between master and replica is over a SSL connection.

Within the replica's postgresql.conf, which is now a mirror of the master's postgresql.conf, the following default should be changed to:

hot_standby = on

PostgreSQL can now be started on the replica. If all goes well, you should see this in the logfile:

2014-07-02 14:01:13 CDT   LOG:  consistent recovery state reached at 0/A041870
2014-07-02 14:01:13 CDT   LOG:  redo starts at 0/A041870
2014-07-02 14:01:13 CDT   LOG:  record with zero length at 0/A041910
2014-07-02 14:01:13 CDT   LOG:  database system is ready to accept read only connections
2014-07-02 14:01:13 CDT   LOG:  started streaming WAL from primary at 0/A000000 on timeline 1

If a table is created on the replica machine, PostgreSQL will automatically complain, since the replica are read only:

# create table test_this_replication (some_col text);
ERROR:  cannot execute CREATE TABLE in a read-only transaction