Final table swap fails with `cannot fetch toast data without an active snapshot` error
I'm trying to squeeze a large table with a lot of bloat. The worker seems to make it almost all the way through the squeeze process -- from squeeze.get_active_workers() I see ins_initial slowly rise, then ins / upd / del rise quickly, and but then a failure where the original table remains in place. In squeeze.errors, I see a row like this:
| id | occurred | tabschema | tabname | sql_state | err_msg | err_detail |
|---|---|---|---|---|---|---|
| 7 | 2025-03-19 12:43:16.197313+00 | public | app_background_actions | XX000 | cannot fetch toast data without an active snapshot |
We've successfully squeezed other tables on this database. I've tried a few times at different times of day and with less and less active load on the table, but I've gotten the same error each time.
Here's the other system stats:
PostgreSQL 13.18 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit- Running on Cruchybridge in GCP
- pg_squeeze version v1.6 from
pg_extension:
| oid | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition |
|---|---|---|---|---|---|---|---|
| 4064493083 | pg_squeeze | 10 | 4064493081 | 0 | 1.6 | NULL | NULL |
- other active logical replication slots on this database
defaultreplica identity for this table:
SELECT CASE relreplident
WHEN 'd' THEN 'default'
WHEN 'n' THEN 'nothing'
WHEN 'f' THEN 'full'
WHEN 'i' THEN 'index'
END AS replica_identity
FROM pg_class
WHERE oid = 'app_background_actions'::regclass;
| replica_identity |
|---|
| default |
- I don't see any errors in the database log from around the time of the swap, but I am not exactly sure what I am looking for.
I'm not quite sure what other debugging information to share -- I know this isn't very much. Let me know and I can try to collect it to understand what's happening!
Origin of error: https://github.com/postgres/postgres/blob/474aee3dfe8860994893959ef1566c51d3304aa3/src/backend/access/common/toast_internals.c#L646
Why do you think the error happens during the swap phase? Does server log indicate so? In such a case, the only idea I have so far is that pg_class entry for your table has some column (relacl, reloptions, relpartbound) TOASTed. It's unusual, but seems possible. Can you please check?
Why do you think the error happens during the swap phase? Does server log indicate so?
No hard evidence, just guessing based on watching the entry in the squeeze.get_active_workers() go through the normal phases. In the ~7 attempts I've ran, I always see ins_initial rise to about the row count of the table, with the other three values remaining 0, then them all rise, then the error. Is there a more verbose log I can access to share somehow?
In such a case, the only idea I have so far is that pg_class entry for your table has some column (relacl, reloptions, relpartbound) TOASTed. It's unusual, but seems possible. Can you please check?
We have a massive postgres catalog, so something like this might be happening. I did my best to check if any columns of pg_class are toasted for this table with this:
SELECT c.relname,
a.attname,
pg_column_size(c.*) AS total_size,
pg_column_size(c.relname) AS relname_size,
pg_column_size(c.relacl) AS relacl_size,
pg_column_size(c.reloptions) AS reloptions_size,
pg_column_size(c.relpartbound) AS relpartbound_size
FROM pg_class c
JOIN pg_attribute a ON a.attrelid = 'pg_class'::regclass
WHERE c.relname = 'app_background_actions'
AND a.attnum > 0
AND a.attisdropped = false
ORDER BY a.attnum;
and got this out, which suggests that none are I think:
| relname | attname | total_size | relname_size | relacl_size | reloptions_size | relpartbound_size |
|---|---|---|---|---|---|---|
| app_background_actions | oid | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relname | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relnamespace | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | reltype | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | reloftype | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relowner | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relam | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relfilenode | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | reltablespace | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relpages | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | reltuples | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relallvisible | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | reltoastrelid | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relhasindex | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relisshared | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relpersistence | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relkind | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relnatts | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relchecks | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relhasrules | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relhastriggers | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relhassubclass | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relrowsecurity | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relforcerowsecurity | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relispopulated | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relreplident | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relispartition | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relrewrite | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relfrozenxid | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relminmxid | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relacl | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | reloptions | 452 | 64 | 93 | 184 | NULL |
| app_background_actions | relpartbound | 452 | 64 | 93 | 184 | NULL |
Hm, pg_class actually has no TOAST relation.
Can you please dump the table (without data), restore it in another database and try if the squeezing completes? Failure here would reveal that there is something specific about the table definition.
If that works, can you please get the backtrace of the error? (The PID to attach to can be retrieved from get_active_workers(), unless the worker exits too early.) Thanks.
This is rather a wild idea, anyway: I wonder if you managed to "accidentally" add TOAST table to some system catalog. It's probably not very likely, but might be worth checking. Can you please share the output of this query?
select count(*) from pg_class where oid < 16384 and reltoastrelid > 0;
The count returns 38 rows, here's the raw result:
| oid | relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | reltoastrelid | relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | relhasrules | relhastriggers | relhassubclass | relrowsecurity | relforcerowsecurity | relispopulated | relreplident | relispartition | relrewrite | relfrozenxid | relminmxid | relacl | reloptions | relpartbound |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 826 | pg_default_acl | 11 | 12042 | 0 | 10 | 2 | 826 | 0 | 1 | 20 | 1 | 4143 | 1 | 0 | p | r | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 1213 | pg_tablespace | 11 | 12027 | 0 | 10 | 2 | 0 | 1664 | 1 | 2 | 1 | 4185 | 1 | 1 | p | r | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257483384 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 1247 | pg_type | 11 | 71 | 0 | 10 | 2 | 0 | 0 | 42415 | 862704 | 42415 | 4171 | 1 | 0 | p | r | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257520402 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 1255 | pg_proc | 11 | 81 | 0 | 10 | 2 | 0 | 0 | 99 | 3270 | 99 | 2836 | 1 | 0 | p | r | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257483384 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 1260 | pg_authid | 11 | 2842 | 0 | 10 | 2 | 0 | 1664 | 2 | 33 | 2 | 4175 | 1 | 1 | p | r | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257260192 | 305707150 | {postgres=arwdDxt/postgres} | NULL | NULL |
| 1262 | pg_database | 11 | 1248 | 0 | 10 | 2 | 0 | 1664 | 1 | 5 | 1 | 4177 | 1 | 1 | p | r | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 1417 | pg_foreign_server | 11 | 12037 | 0 | 10 | 2 | 1417 | 0 | 0 | 0 | 0 | 4151 | 1 | 0 | p | r | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 1418 | pg_user_mapping | 11 | 12038 | 0 | 10 | 2 | 1418 | 0 | 0 | 0 | 0 | 4173 | 1 | 0 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257520402 | 305707158 | {postgres=arwdDxt/postgres} | NULL | NULL |
| 2328 | pg_foreign_data_wrapper | 11 | 12036 | 0 | 10 | 2 | 2328 | 0 | 1 | 2 | 1 | 4149 | 1 | 0 | p | r | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257270919 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2396 | pg_shdescription | 11 | 12029 | 0 | 10 | 2 | 0 | 1664 | 1 | 3 | 1 | 2846 | 1 | 1 | p | r | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257493699 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2600 | pg_aggregate | 11 | 12013 | 0 | 10 | 2 | 2600 | 0 | 2 | 141 | 2 | 4159 | 1 | 0 | p | r | 22 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257378384 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2604 | pg_attrdef | 11 | 12000 | 0 | 10 | 2 | 2604 | 0 | 42435 | 711242 | 42435 | 2830 | 1 | 0 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257427665 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2606 | pg_constraint | 11 | 12001 | 0 | 10 | 2 | 2606 | 0 | 6130 | 181951 | 6130 | 2832 | 1 | 0 | p | r | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257427665 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2609 | pg_description | 11 | 12020 | 0 | 10 | 2 | 2609 | 0 | 42 | 4844 | 42 | 2834 | 1 | 0 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257427665 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2612 | pg_language | 11 | 12010 | 0 | 10 | 2 | 2612 | 0 | 1 | 4 | 1 | 4157 | 1 | 0 | p | r | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2615 | pg_namespace | 11 | 12023 | 0 | 10 | 2 | 2615 | 0 | 14 | 1083 | 14 | 4163 | 1 | 0 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2618 | pg_rewrite | 11 | 12017 | 0 | 10 | 2 | 2618 | 0 | 13 | 136 | 13 | 2838 | 1 | 0 | p | r | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257483384 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2619 | pg_statistic | 11 | 12016 | 0 | 10 | 2 | 2619 | 0 | 211831 | 2188087 | 211417 | 2840 | 1 | 0 | p | r | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 159478976 | 305695944 | {postgres=arwdDxt/postgres} | NULL | NULL |
| 2620 | pg_trigger | 11 | 12018 | 0 | 10 | 2 | 2620 | 0 | 6 | 262 | 6 | 2336 | 1 | 0 | p | r | 19 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257493699 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 2964 | pg_db_role_setting | 11 | 12026 | 0 | 10 | 2 | 0 | 1664 | 1 | 14 | 1 | 2966 | 1 | 1 | p | r | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257260192 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3079 | pg_extension | 11 | 12035 | 0 | 10 | 2 | 3079 | 0 | 1 | 11 | 1 | 4147 | 1 | 0 | p | r | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257270919 | 305707152 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3118 | pg_foreign_table | 11 | 12039 | 0 | 10 | 2 | 3118 | 0 | 0 | 0 | 0 | 4153 | 1 | 0 | p | r | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257280233 | 305707152 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3256 | pg_policy | 11 | 12040 | 0 | 10 | 2 | 3256 | 0 | 0 | 0 | 0 | 4167 | 1 | 0 | p | r | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257493699 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3350 | pg_partitioned_table | 11 | 12046 | 0 | 10 | 2 | 3350 | 0 | 0 | 0 | 0 | 4165 | 1 | 0 | p | r | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257472497 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3381 | pg_statistic_ext | 11 | 12014 | 0 | 10 | 2 | 3381 | 0 | 1 | 19 | 1 | 3439 | 1 | 0 | p | r | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257483384 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3394 | pg_init_privs | 11 | 12043 | 0 | 10 | 2 | 3394 | 0 | 3 | 193 | 3 | 4155 | 1 | 0 | p | r | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3429 | pg_statistic_ext_data | 11 | 12015 | 0 | 10 | 2 | 3429 | 0 | 1 | 19 | 1 | 3430 | 1 | 0 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257460173 | 305707158 | {postgres=arwdDxt/postgres} | NULL | NULL |
| 3456 | pg_collation | 11 | 12045 | 0 | 10 | 2 | 3456 | 0 | 25 | 779 | 25 | 4161 | 1 | 0 | p | r | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257260192 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3466 | pg_event_trigger | 11 | 12019 | 0 | 10 | 2 | 3466 | 0 | 0 | 0 | 0 | 4145 | 1 | 0 | p | r | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257148318 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3592 | pg_shseclabel | 11 | 4066 | 0 | 10 | 2 | 0 | 1664 | 0 | 0 | 0 | 4060 | 1 | 1 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257472497 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3596 | pg_seclabel | 11 | 12044 | 0 | 10 | 2 | 3596 | 0 | 0 | 0 | 0 | 3598 | 1 | 0 | p | r | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257101805 | 305707150 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 3600 | pg_ts_dict | 11 | 12032 | 0 | 10 | 2 | 3600 | 0 | 1 | 23 | 1 | 4169 | 1 | 0 | p | r | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257520402 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 6000 | pg_replication_origin | 11 | 12041 | 0 | 10 | 2 | 0 | 1664 | 0 | 0 | 0 | 4181 | 1 | 1 | p | r | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257483384 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 6100 | pg_subscription | 11 | 6101 | 0 | 10 | 2 | 0 | 1664 | 0 | 0 | 0 | 4183 | 1 | 1 | p | r | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | n | 0 | 0 | 257472497 | 305707158 | {postgres=arwdDxt/postgres} | NULL | NULL |
| 13284 | sql_features | 13150 | 13285 | 0 | 10 | 2 | 13284 | 0 | 8 | 712 | 8 | 13286 | 0 | 0 | p | r | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | d | 0 | 0 | 257571453 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 13289 | sql_implementation_info | 13150 | 13290 | 0 | 10 | 2 | 13289 | 0 | 1 | 12 | 1 | 13291 | 0 | 0 | p | r | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | d | 0 | 0 | 257581998 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
| 13294 | sql_parts | 13150 | 13295 | 0 | 10 | 2 | 13294 | 0 | 1 | 10 | 1 | 13296 | 0 | 0 | p | r | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | d | 0 | 0 | 257559073 | 305707158 | NULL | NULL | NULL |
| 13299 | sql_sizing | 13150 | 13300 | 0 | 10 | 2 | 13299 | 0 | 1 | 23 | 1 | 13301 | 0 | 0 | p | r | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | d | 0 | 0 | 257565538 | 305707158 | {postgres=arwdDxt/postgres,=r/postgres} | NULL | NULL |
Also, we ended up fighting through the pain without squeezing and have now dropped the problematic table in production, so our actual production issue has been solved, but I've also lost the ability to reproduce this and debug, apologies.