PostgreSQL HA Cluster | Server Configuration

Client Apps : psql | pg_ctl | initdb | pg_basebackup | …

Configuration (fnames/locations) : pg_hba.conf

TL;DR

Streaming Replication : Symmetric Servers

Streaming Replication with Replication Slots implemented on a symmetrical pair of containerized PostgreSQL servers running in a Docker swarm stack. This is a native PostgreSQL (v12/13) implementation, sans external dependencies. The servers are stateless. State is maintained in the peristent data store mounted thereto.

The functional difference between the two servers is that one is in primary (read/write) mode and the other is in hot standby (read-only). The primary is continuously archiving, and the standby is continuously recovering; both per WAL (file) shipping. Separately, point-in-time recovery (PITR) and cluster backup (base backup) are available ad-hoc, imperatively, while the servers are online. All such functionality is per canonical PostgreSQL implementation.

The symmetrical arrangement is robust and simple to configure. The distinctions between the two servers amount to a zero-byte signal file (standby.signal) existing exclusively in the $PGDATA directory of whichever is in standby mode; that and the requisite anti-symmetrical settings of replication-connection parameters. Beyond that, they are identical. Each server has a unique host name, and each its own data and archive stores. Extending the scheme to multiple standby servers requires only cloning the one and the processes to configure it.

Note that primary/standby are operational declarations, whereas each server/service is configured per hardware declarations and its relevant identity and connection parameters. The point being the former are swappable, while the latter must remain immutable; the (operational) role of a server/service is what toggles, not the server-service-hardware associations. All relevant code must abide the distinction. In a containerized deployment, each PostgreSQL server is a named service, constrained (tethered) to its (configured) hardware regardless of its container(s) popping in and out of existence to provide the services.

Bootstrapping and ad hoc backup (tarball) processes both utilize the same PostgreSQL utility (pg_basebackup); both operating on the $PGDATA directory. After servers' initialization and bootstrap, failover(s) are performed by merely adding the standby.signal file to the former (demoted) primary, and deleting same from the former (promoted) standy. PITR is performed similarly. Though the servers can swap modes on demand at any time, the scheme is primarily for automated failover. Total transition time is set by application latency; there should never be two primary servers, so the demote/promote duration should be sufficient to assure this. Lest the PostgreSQL servers are spread across the globe, this is typically tens of milliseconds. That's the duration over which clients would lose write access on failover (or any other change of primary).

Routing requests to the appropriate server is an external responsibility. According to PostgreSQL documentation, best performance is achieved when the (hot) standby services all read requests, lightening the load of the primary as it services all write requests. The idea there is the anti-symmetrical nature of such processes; the former being relatively greater in number and lower in computational intensity per, and the latter being the reverse of those two metrics.

Features / Functions / Modes / Topologies

Standalone Hot Backups | pg_basebackup

Simplest, but least live; not HA. See Barman solution.

pg_basebackup is used to take a base backup of a running PostgreSQL database cluster. The backup is taken without affecting other clients of the database, and can be used both for point-in-time recovery and as the starting point for a log-shipping or streaming-replication standby server.

Makes an exact copy of your data directory so, all you need to do to restore from that backup is to point postgres at that directory and start it up.

pg_basebackup -h $hostname -U $username -D $local_dir

Compressed

archive_command = 'gzip < %p > /var/lib/postgresql/data/%f'

restore_command = 'gunzip < /home/pgbasebackup/%f > %p'

Recover/Restore from Hot Backup

recovery.conf is OBSOLETE per v.12

FATAL: using recovery command file "recovery.conf" is not supported

Streaming Replication

pg_basebackup \
    -h 192.169.12.1 -p 5432 -U replicator \
    -D /var/lib/pgsql/13/data -Fp -R -Xs -P

Continuous Archiving and Point-in-Time Recovery (PITR)

PostgreSQL maintains a write ahead log (WAL) in the /pg_wal/ subdirectory of the data directory; the database can be restored to consistency by replaying the log entries made since the last checkpoint.

A third strategy for backing up databases: combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the file system backup and then replay from the backed-up WAL files to bring the system to a current state.

Setting Up WAL Archiving

A running PostgreSQL system produces an indefinitely long sequence of WAL records ... divides this into WAL segment files (16MB apiece); given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then “recycles” them by renaming no-longer-needed segment files to higher segment numbers.

When archiving WAL data, we need to capture the contents of each segment file ... before the segment file is recycled for reuse.

... many different ways of “saving the data somewhere”: copy the segment files to an NFS-mounted directory on another machine, write them onto a tape drive, or batch them together and burn them onto CDs, ...

PostgreSQL lets the administrator specify a shell command to be executed to copy a completed segment file to wherever it needs to go. The command could be as simple as a cp, or it could invoke a complex shell script — it's all up to you.

To enable WAL archiving, set the wal_level configuration parameter to replica or higher, archive_mode to on, and specify the shell command to use in the archive_command configuration parameter. In practice these settings will always be placed in the postgresql.conf file.

Write Ahead Log (WAL) params

Example archive_command when run ...

test ! -f /mnt/server/archivedir/00000001000000A900000065 \
    && cp pg_wal/00000001000000A900000065 \
          /mnt/server/archivedir/00000001000000A900000065

The speed of the archiving command is unimportant as long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. If archiving falls significantly behind, this will increase the amount of data that would be lost in the event of a disaster. It will also mean that the /pg_wal/ directory will contain large numbers of not-yet-archived segment files, which could eventually exceed available disk space.

Monitor the archiving process to ensure that it is working as you intend.

Replication

Recovering Using a Continuous Archive Backup

restore_command = 'cp /mnt/server/archivedir/%f %p'

Log Shipping : Standby Servers

… continuous archiving to create a high availability (HA) cluster configuration with one or more standby servers ready to take over operations if the primary server fails; Warm Standby or log shipping. … asynchronous, i.e., the WAL records are shipped after transaction commit.

Primary and standby server work together, loosely coupled. Primary server operates in continuous archiving mode. Standby servers operates in continuous recovery mode, reading the WAL files from the primary.

Recovery performance is sufficiently good that the Warm Standby will typically be only moments away from full availability once it has been activated.

Hot Standby server is a standby server that can also be used for read-only queries.

No changes to the database tables are required to enable this capability, so it offers low administration overhead compared to some other replication solutions. This configuration also has relatively low performance impact on the primary server.

Standby Server Operation

A PostgreSQL server enters Standby Mode if a standby.signal file exists in its data directory when the server is started.

Standby servers continuously apply WAL received either directly from the master (streaming replication), or from a WAL archive (see restore_command).

/pg_wal

Shared archive directory; must be available to all standby servers in the cluster; the standby servers attempt to restore any WAL files found therein. That typically happens after a server restart, when the standby replays WAL that was streamed from the master before the restart; can also manually copy files to /pg_wal at any time to have them replayed.

This is a kind of backup bin for Streaming Replication; if the standby servers can't keep up, so the primary "recycles" WAL data before the standby servers process it from the stream, the data will automatically be revovered (replayed) from WAL files in this directory.

The Loop

At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_wal directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_wal. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_wal, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.

Prepare Master for Standby Servers

Set Up a Standby Server

Set up identical to primary (WAL archiving, connections and authentication) because the standby server will work as a primary server after failover.

If you're using a WAL archive, its size can be minimized using the

archive_cleanup_command

WAL archive parameter to remove files that are no longer required by the standby server. The pg_archivecleanup utility is designed specifically to be used with archive_cleanup_command in typical single-standby configurations, see pg_archivecleanup. Note however, that if you're using the archive for backup purposes, you need to retain files needed to recover from at least the latest base backup, even if they're no longer needed by the standby.

Example configuration (@ postgresql.conf ???):

primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
restore_command = 'cp /path/to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'

For streaming replication set max_wal_senders high enough in the primary to allow them to be connected simultaneously.

Streaming Replication

The step that turns a file-based log-shipping standby into streaming replication standby is setting the primary_conninfo setting to point to the primary server. Set listen_addresses and authentication options (see pg_hba.conf) on the primary so that the standby server can connect.

… allows a standby server to stay more up-to-date than is possible with file-based log shipping; the primary streams WAL records to the standby as they're generated; asynchronous by default; a small delay between committing a transaction in the primary and the changes becoming visible in the standby; the delay is much smaller than with file-based log shipping, typically under one second assuming the standby is powerful enough to keep up with the load. With streaming replication, archive_timeout is not required to reduce the data loss window.

Use in conjunction with file-based continuous archiving; set up a WAL archive that's accessible from the standby, else ...

If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_size to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that's accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments.

On systems that support the keepalive socket option, setting tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count helps the primary promptly notice a broken connection.

Set the maximum number of concurrent connections from the standby servers (see max_wal_senders for details).

When the standby is started and primary_conninfo is set correctly, the standby will connect to the primary after replaying all WAL files available in the archive. If the connection is established successfully, you will see a walreceiver in the standby, and a corresponding walsender process in the primary.

Parameters

Hot Standby params