This commit adds ability to filter replica nodes by replication lag. User can provide ReplicationLagChecker function and MaxreplicationLag value as options to cluster constructor to check nodes and remove any replica which is too slow.

Example:

// Construct cluster nodes
nodes := []hasql.Node{...}

// Use options to fine-tune cluster behavior
opts := []hasql.ClusterOption{
	hasql.WithUpdateInterval(2 * time.Second),        // set custom update interval
	hasql.WithNodePicker(hasql.PickNodeRoundRobin()), // set desired nodes selection algorithm
	hasql.WithReplicationLagChecker(checkers.PostgreSQLReplicationLag), // set appropriate replication lag checker
	hasql.WithMaxReplicationLag(10 * time.Second), // set desired maximum lag value
}

// Create cluster handler
c, err := hasql.NewCluster(nodes, checkers.PostgreSQL, opts...)
if err != nil {
	panic(err)
}
defer func() { _ = c.Close() }() // close cluster when it is not needed

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

// Wait for any alive standby
node, err = c.WaitForStandby(ctx)
if err != nil {
	panic(err)
}

...

Jun 22 '21 15:06 bbrodriges

Codecov Report

Merging #4 (e71a5ac) into master (ebd5959) will decrease coverage by 2.34%. The diff coverage is 65.51%.

@@            Coverage Diff             @@
##           master       #4      +/-   ##
==========================================
- Coverage   85.25%   82.90%   -2.35%     
==========================================
  Files           7        7              
  Lines         217      234      +17     
==========================================
+ Hits          185      194       +9     
- Misses         27       35       +8     
  Partials        5        5

Flag	Coverage Δ
unittests	`82.90% <65.51%> (-2.35%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
node.go	`75.00% <ø> (ø)`
cluster_opts.go	`83.33% <50.00%> (-16.67%)`	:arrow_down:
check_nodes.go	`84.84% <63.63%> (-7.88%)`	:arrow_down:
cluster.go	`88.49% <100.00%> (+0.20%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ebd5959...e71a5ac. Read the comment docs.

Jun 22 '21 15:06 codecov-commenter

We should also consider a corner case with dead primary. So it seems that our logic should be more complex (and stateful):

If primary is alive we could select replicas from pg_stat_replication with state = 'streaming' and reply_time < <some threshold> and reply_lag < <some threshold> (and remember a set of replicas passing our checks on each iteration)
If primary is dead and we have previous replica lag selection results we could use it
If primary is dead and we have no previous replica lag selection results we could get a replica with maximal pg_last_wal_replay_lsn() and compute lags (with select max_replica_replay_ts - pg_last_xact_replay_timestamp()) from it's pg_last_xact_replay_timestamp() on other replicas

Any thoughts?

Jun 22 '21 17:06 secwall

We should also consider a corner case with dead primary. So it seems that our logic should be more complex (and stateful):

1. If primary is alive we could select replicas from `pg_stat_replication` with `state = 'streaming'` and `reply_time < <some threshold>` and `reply_lag < <some threshold>` (and remember a set of replicas passing our checks on each iteration)

2. If primary is dead and we have previous replica lag selection results we could use it

3. If primary is dead and we have no previous replica lag selection results we could get a replica with maximal `pg_last_wal_replay_lsn()` and compute lags (with `select max_replica_replay_ts - pg_last_xact_replay_timestamp()`) from it's `pg_last_xact_replay_timestamp()` on other replicas

Any thoughts?

Sounds good. The only possible problem I see is keeping state. That might be a bit more complicated than it sounds. But gotta try first.

Jun 22 '21 18:06 sidh

1. If primary is alive we could select replicas from `pg_stat_replication` with `state = 'streaming'` and `reply_time < <some threshold>` and `reply_lag < <some threshold>` (and remember a set of replicas passing our checks on each iteration)

We should probably measure lag only from async streaming replicas.

Jun 23 '21 10:06 bbrodriges

Also reply_time column has been introduced since PG 12. We need to find a way to mimic this column for older versions.

Jun 23 '21 10:06 bbrodriges

1. If primary is alive we could select replicas from `pg_stat_replication` with `state = 'streaming'` and `reply_time < <some threshold>` and `reply_lag < <some threshold>` (and remember a set of replicas passing our checks on each iteration)

We should probably measure lag only from async streaming replicas.

Hmm, should we? There are several possible cases:

We use quorum replication => all replicas are quorum and only analytical ones are async. Should we limit our selection with analytical ones? Probably not. But different quorum replicas could have different lag. So we should check it.
We use on or remote_write for synchronous_commit (for performance reasons). In this mode sync replicas could be significantly behind async ones in terms of wal apply. So sync replicas also should be checked for lag.
We use remote_apply for synchronous_commit. Yep, this is the case where we could omit checking sync replica lag. Unfortunately synchronous_commit is session-level setting. So some sessions could use case number 2. We should probably be ready for that.

Jun 23 '21 11:06 secwall

[feature] Support replication lag checking

Codecov Report