wal2json icon indicating copy to clipboard operation
wal2json copied to clipboard

Best practice to handle "no old tuple data for UPDATE in table"?

Open qianl15 opened this issue 1 year ago • 11 comments

Hello, I've been using wal2json to export changes from a Postgres instance. I use pg_logical_slot_peek_changes to obtain changes and then pg_replication_slot_advance to advance the slot separately. However, very occasionally I would get a no old tuple data for UPDATE in table warning message emitted from wal2json, and the update record would have an empty identity field.

I found the source code that emits this message, but I wonder why could it happen? If this happens, should I retry pg_logical_slot_peek_changes again? What is the best practice to handle this situation?

I also couldn't find a way to reproduce this issue. This happens randomly once every few days on a database with low load.

Any help would be appreciated. Thanks!

qianl15 avatar May 23 '24 00:05 qianl15

@qianl15 there are a few cases related to this message. Read about REPLICA IDENTITY. Postgres writes the information (old tuple) in the WAL that identifies the row that it is updating or deleting.

This message generally means:

a. REPLICA IDENTITY DEFAULT: your table does not contain a primary key. That's the common case. b. REPLICA IDENTITY NOTHING: it does not store old row. Your setup is wrong and you should change it to DEFAULT (if the table has a primary key).

If it is case (a) and you don't want to add a primary key, you have 2 options:

  • REPLICA IDENTITY FULL: it stores all columns as old tuple, which can increase the amount of WAL to write.
  • REPLICA IDENTITY INDEX: it stores columns from an unique index as old tuple.

eulerto avatar May 23 '24 01:05 eulerto

Thanks @eulerto for your quick response! Just to clarify that my table contains a primary key and uses REPLICA IDENTITY DEFAULT. In my test, I insert 10,000 rows into that table, peek changes, export changes to another database, and finally advance the replication slot. I repeat this test process many times a day, but this empty identity issue only happens once every few days. More precisely, occasionally only one out of thousands of update records would have an empty identity field.

The table looks like this:

create table hello (
 name  text,
 greet_count  integer,
 primary key (name)
)

My workload simply updates the greet_count of the same row 10,000 times.

The peek slot command I used:

select lsn::text, xid::text, data from pg_logical_slot_peek_changes('myslot', NULL, 5000, 'format-version', '2', 'include-types', 'false', 'filter-tables', '*.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'include-lsn', 'true')

where I limit to peek up to 5000 changes each time.

Environment: RDS Postgres 16.1

qianl15 avatar May 23 '24 02:05 qianl15

What's your Postgres version? Are you using the latest wal2json version? What's the SQL command to advance the replication slot?

Did you observe any pattern from the missing old tuples? Could you share a test case that we can reproduce the issue? (Even if the test case takes several hour to reproduce.)

eulerto avatar May 23 '24 02:05 eulerto

Postgres version: PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit wal2json version: 2.5 (pre-installed by RDS)

Advance slot command: SELECT * FROM pg_replication_slot_advance(myslot, <latest_nextlsn>) I iterate through the records I got from pg_logical_slot_peek_changes and obtained latest_nextlsn from the "nextlsn" field in the last "C" record of the batch.

I didn't see any particular pattern as it appears to happen rarely at random intervals. I will try to reproduce on my end as well. The actual test depends on a larger internal system on our side, but here's the simplified description of it.

-- Create table
create table hello (name  text, greet_count  integer, primary key (name));

-- Create replication slot
select * from pg_create_logical_replication_slot('myslot', 'wal2json');

-- Repeat this command many times
INSERT INTO hello (name, greet_count) VALUES ('test', 1) ON CONFLICT (name) DO UPDATE SET greet_count = hello.greet_count + 1 RETURNING greet_count;

-- The following steps advance the replication slot
-- 1. Peak changes in batches

select lsn::text, xid::text, data from pg_logical_slot_peek_changes('myslot', NULL, 5000, 'format-version', '2', 'include-types', 'false', 'filter-tables', '*.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'include-lsn', 'true')

-- 2. Iterate through records from the above command and get the nextlsn from the last commit record

-- 3. Advance slot with the latest nextlsn
SELECT * FROM pg_replication_slot_advance(myslot, <latest_nextlsn>)

-- 4. Repeat steps 1~3 until no more records are read

qianl15 avatar May 23 '24 17:05 qianl15

This only occurred once since I first reported, even though the test continuously runs every hour. Another observation: if I run pg_logical_slot_peek_changes again the data would become normal. It looks like a very rare and transient issue.

Is there any other traces I can provide to help figure out the issue?

qianl15 avatar May 27 '24 02:05 qianl15

I need a test case. Maybe the WAL records might be useful. Use pg_walinspect to extract the affected transaction. Are you using pg_logical_emit_message?

eulerto avatar Jun 04 '24 02:06 eulerto

Hello! I will get back to you in a bit regarding the pg_walinspect info.

In this particular case, no pg_logical_emit_message was used. Is there anything I need to pay attention to regarding pg_logical_emit_message?

qianl15 avatar Jun 23 '24 21:06 qianl15

Hi @eulerto here's the pg_walinspect output from the affected transaction and transactions surrounding it. The xid was 34826270. All transactions executed the same set of statements (the INSERT ... ON CONFLICT DO UPDATE command I provided above, and a command to insert to another append-only table).

  start_lsn   |   end_lsn    |   prev_lsn   |   xid    | resource_manager | record_type | record_length | main_data_length | fpi_length |                                            description                                            |                                                             block_ref
--------------+--------------+--------------+----------+------------------+-------------+---------------+------------------+------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------
 160/2130D1B8 | 160/2130D280 | 160/2130D188 | 34826269 | Heap             | INSERT      |           194 |                3 |          0 | off: 45, flags: 0x08                                                                              | blkref #0: rel 1663/16562/16564 fork main blk 18812
 160/2130D280 | 160/2130DBA0 | 160/2130D1B8 | 34826269 | Btree            | INSERT_LEAF |          2331 |                2 |       2276 | off: 59                                                                                           | blkref #0: rel 1663/16562/16569 fork main blk 10073 (FPW); hole: offset: 332, length: 3540, compression saved: 2376, method: zstd
 160/2130DBA0 | 160/2130DBE8 | 160/2130D280 | 34826269 | Transaction      | COMMIT      |            65 |               36 |          0 | 2024-06-24 21:03:59.298358+00; origin: node 1, lsn 160/2130D1B8, at 2024-06-24 21:03:59.296157+00 |
 160/2130DBE8 | 160/2130DC20 | 160/2130DBA0 | 34826270 | Heap             | LOCK        |            54 |                8 |          0 | xmax: 34826270, off: 49, infobits: [LOCK_ONLY, EXCL_LOCK], flags: 0x00                            | blkref #0: rel 1663/16424/16439 fork main blk 1
 160/2130DC20 | 160/2130DC70 | 160/2130DBE8 | 34826270 | Heap             | HOT_UPDATE  |            78 |               14 |          0 | old_xmax: 34826270, old_off: 49, old_infobits: [], flags: 0x10, new_xmax: 34826270, new_off: 50   | blkref #0: rel 1663/16424/16439 fork main blk 1
 160/2130DC70 | 160/2130DD30 | 160/2130DC20 | 34826270 | Heap             | INSERT      |           191 |                3 |          0 | off: 15, flags: 0x08                                                                              | blkref #0: rel 1663/16424/16453 fork main blk 18814
 160/2130DD30 | 160/2130E7D0 | 160/2130DC70 | 34826270 | Btree            | INSERT_LEAF |          2692 |                2 |       2637 | off: 68                                                                                           | blkref #0: rel 1663/16424/16459 fork main blk 5596 (FPW); hole: offset: 380, length: 2820, compression saved: 2735, method: zstd
 160/2130E7D0 | 160/2130E810 | 160/2130DD30 | 34826270 | Btree            | INSERT_LEAF |            64 |                2 |          0 | off: 168                                                                                          | blkref #0: rel 1663/16424/16461 fork main blk 436
 160/2130E810 | 160/2130E840 | 160/2130E7D0 | 34826270 | Transaction      | COMMIT      |            46 |               20 |          0 | 2024-06-24 21:03:59.307775+00                                                                     |
 160/2130E840 | 160/2130E908 | 160/2130E810 | 34826271 | Heap             | INSERT      |           194 |                3 |          0 | off: 46, flags: 0x08                                                                              | blkref #0: rel 1663/16562/16564 fork main blk 18812
 160/2130E908 | 160/2130F390 | 160/2130E840 | 34826271 | Btree            | INSERT_LEAF |          2690 |                2 |       2635 | off: 68                                                                                           | blkref #0: rel 1663/16562/16569 fork main blk 5596 (FPW); hole: offset: 380, length: 2820, compression saved: 2737, method: zstd
 160/2130F390 | 160/2130F3D8 | 160/2130E908 | 34826271 | Transaction      | COMMIT      |            65 |               36 |          0 | 2024-06-24 21:03:59.310571+00; origin: node 1, lsn 160/2130E840, at 2024-06-24 21:03:59.307775+00 |
(12 rows)

The Postgres log:

DETAIL: Streaming transactions committing after 160/2130D1B8, reading WAL from 160/20430548.
STATEMENT: select lsn::text, xid::text, data from pg_logical_slot_peek_changes($1, NULL, $2, 'format-version', '2', 'include-types', 'false', 'filter-tables', 'dbos.*, *.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'add-msg-prefixes', 'dbosprov', 'include-lsn', 'true')
LOG: logical decoding found consistent point at 160/20430548
DETAIL: There are no running transactions.
STATEMENT: select lsn::text, xid::text, data from pg_logical_slot_peek_changes($1, NULL, $2, 'format-version', '2', 'include-types', 'false', 'filter-tables', 'dbos.*, *.knex_migrations, *.knex_migrations_lock', 'include-transaction', 'true', 'add-msg-prefixes', 'dbosprov', 'include-lsn', 'true')
WARNING: no old tuple data for UPDATE in table "public"."dbos_hello"
CONTEXT: slot "hello_dbos_prov", output plugin "wal2json", in the change callback, associated LSN 160/2130DC20

But when I retried pg_logical_slot_peek_changes again, I got no warning messages and the old tuple identity was there. It looked like a transient issue.

qianl15 avatar Jun 25 '24 00:06 qianl15

The only explanation for the OIDs in Relation to be invalid in following code is due to an invalidation.

https://github.com/eulerto/wal2json/blob/75629c2e1e81a12350cc9d63782fc53252185d8d/wal2json.c#L2453-L2462

The Relation seems to be invalidated during this code path and then OidIsValid returns false. It's a race condition. :(

eulerto avatar Jun 25 '24 21:06 eulerto

What would be the cause for a Relation invalidation? (sorry I'm not familiar with PG internal..)

And retrying pg_logical_slot_peek_changes would be a workaround in this case?

qianl15 avatar Jun 25 '24 23:06 qianl15

See RelationCacheInvalidate. You should also check inval.c. Logical replication processes some invalidation messages when calls AcceptInvalidationMessages. This function removes some catalog caches and relation cache that are not useful anymore. It also rebuilds the entries that are still useful.

The only workaround is to retry pg_logical_slot_peek_changes. (I'm investigating if this code can be changed to avoid this issue.)

eulerto avatar Jun 26 '24 02:06 eulerto