`IDENTIFY_SYSTEM` not compatible with Google AlloyDB

Open jace-ys opened this issue 11 months ago • 1 comments

Hey there 👋🏻

For context, we are currently running AlloyDB - Google's "fully managed PostgreSQL-compatible database service".

Not exactly what it says on the tin, because we found a PostgreSQL incompatibility that breaks pgcopydb when we try to do a clone --follow with AlloyDB as the source (the target isn't important here).

From logs:

025-01-22 15:36:28.889 13 ERROR  pgsql_timeline.c:157      Query returned 5 columns, expected 4
2025-01-22 15:36:28.889 13 SQL    pgsql_timeline.c:75       IDENTIFY_SYSTEM: timeline 0, xlogpos , systemid 0
2025-01-22 15:36:28.889 13 ERROR  pgsql_timeline.c:82       Failed to get result from IDENTIFY_SYSTEM

After some digging, we realized that calling IDENTIFY_SYSTEM on AlloyDB returns 5 columns, not 4..

test=> IDENTIFY_SYSTEM;
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
systemid  | 7439402673173786878
timeline  | 1
xlogpos   | 5E/C0078F00
dbname    | test
authtoken | <REDACTED>

As you can see, Google added a new authtoken column that's different from the official PostgreSQL spec.

While we understand the fault is mostly on Google's end, we were wondering if this check could be relaxed slightly: https://github.com/dimitri/pgcopydb/blob/ac625783e85be0fb99f8b21a282438ea3f7de22c/src/bin/pgcopydb/pgsql_timeline.c#L155-L160

Maybe checking for if (PQnfields(result) >= 4) would suffice and allow pgcopydb to work with AlloyDB? This shouldn't be a breaking change as pgcopydb can still extract the columns it cares about.

Open to other suggestions!

Jan 22 '25 17:01 jace-ys

Hi @jace-ys ; thanks for opening this issue. I believe we can relax the check there, but in doing so we still want to make sure we can make sense of the returned columns. What if another Postgres fork re-order the column in the result?

I think we should use libpq API to fetch column number from column name for the 4 columns we are interested in, and then only error out if we fail to find one of these.

Do you want to prepare a Pull Request?

Jan 27 '25 13:01 dimitri