gpdb
gpdb copied to clipboard
Error: "attempted to update invisible tuple" in truncate command
Bug Report
In the process of https://github.com/greenplum-db/gpdb/pull/17200, we found the test case reported:
ERROR: spiexceptions.ObjectNotInPrerequisiteState: attempted to update invisible tuple
We reproduced it with adding a similar test in the latest 7x code (only add test case without any change of GP code): https://github.com/greenplum-db/gpdb/compare/main...yaowangm:gpdb:v7_trunc_error
Same error in the dev pipeline: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/wayao_trunc_error/jobs/icw_gporca_resgroup_rocky8/builds/1
This error is very similar to https://github.com/greenplum-db/gpdb/commit/3fbe189e45166765ac0e0c695e7eecf7aa49d9a6, but it's on heap table.
Greenplum version or build
lastest 7X, top commit: https://github.com/greenplum-db/gpdb/commit/e747b2e1c2a67fd84a8213ccd5c261869c77d772 2024-04-01
OS version and uname -a
autoconf options used ( config.status --config )
Installation information ( pg_config )
Expected behavior
no errors
Actual behavior
Error: "attempted to update invisible tuple"
Step to reproduce the behavior
please see the code diff
Simple reproduce using isolation2
create table t1(a int);
create table t2(a int);
1: begin;
1: select txid_current();
1: end;
2: begin;
2: select txid_current();
3: begin;
3: savepoint s1;
3: truncate t2;
4: begin;
4: select txid_current();
4: end;
5: begin;
5: select txid_current();
6: select * from t1 join (select oid from pg_class) x(a) on x.a = t1.a;
3: savepoint s2;
3: truncate t2;
1q:
2q:
3q:
4q:
5q:
6q:
Save the above SQL in a file a.sql
, and then run:
python src/test/isolation2/sql_isolation_testcase.py < ~/a.sql
...
...
3: savepoint s2;
SAVEPOINT
3: truncate t2;
ERROR: attempted to update invisible tuple
1q: ... <quitting>
2q: ... <quitting>
3q: ... <quitting>
4q: ... <quitting>
5q: ... <quitting>
6q: ... <quitting>
Besides the error above, there is another error on https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/wayao_trunc_error/jobs/icw_gporca_jit_rocky8/builds/1:
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/gp_sort.out /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/gp_sort.out
--- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/gp_sort.out 2024-04-01 03:21:45.118432879 +0000
+++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/gp_sort.out 2024-04-01 03:21:45.122433151 +0000
...
+DETAIL: Key (oid)=(43510) already exists.
+ERROR: spiexceptions.UniqueViolation: duplicate key value violates unique constraint "pg_class_oid_index"
+CONTEXT: Traceback (most recent call last):
+ PL/Python function "gp_mksort_test", line 22, in <module>
+ plpy.execute("truncate gp_mksort_test_table;")
+PL/Python function "gp_mksort_test"
I suppose they are due to the same root cause.
The issue is easy to be repro via pipeline (not 100%), but difficult on my local env.
Repro:
Under src/test/regress, create a new file my_schedule
:
test: test_setup
test: gp_aggregates gp_aggregates_costs gp_metadata variadic_parameters default_parameters function_extensions spi gp_xml shared_scan update_gp triggers_gp returning_gp gp_types combocid_gp gp_sort gp_prepared_xacts gp_backend_info gp_locale foreign_key_gp
Then keep running the script:
./pg_regress --init-file=init_file --load-extension=gp_inject_fault --schedule=my_schedule
Repeat for a couple of minutes, we may see gp_sort
failed.
I got the coredump with adding an extra assertion to the error code path:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f4c60d17859 in __GI_abort () at abort.c:79
#2 0x00005572b1c2cef0 in ExceptionalCondition (conditionName=conditionName@entry=0x5572b21d1c89 "!(0)",
errorType=errorType@entry=0x5572b21cc00b "FailedAssertion", fileName=fileName@entry=0x5572b21eda92 "heapam.c",
lineNumber=lineNumber@entry=3226) at assert.c:44
#3 0x00005572b1699c05 in heap_update_internal (relation=relation@entry=0x7f4c5307bdf0,
otid=otid@entry=0x5572bb2669ac, newtup=newtup@entry=0x5572bb2669a8, cid=<optimized out>,
crosscheck=crosscheck@entry=0x0, wait=wait@entry=true, tmfd=0x7ffe64a9df30, lockmode=0x7ffe64a9df2c, simple=true)
at heapam.c:3226
#4 0x00005572b169b2c2 in simple_heap_update (relation=relation@entry=0x7f4c5307bdf0, otid=otid@entry=0x5572bb2669ac,
tup=tup@entry=0x5572bb2669a8) at heapam.c:4108
#5 0x00005572b17759ba in CatalogTupleUpdate (heapRel=heapRel@entry=0x7f4c5307bdf0, otid=otid@entry=0x5572bb2669ac,
tup=tup@entry=0x5572bb2669a8) at indexing.c:286
#6 0x00005572b1c15840 in RelationSetNewRelfilenode (relation=relation@entry=0x7f4c53030b18, persistence=112 'p')
at relcache.c:3856
#7 0x00005572b187cbdf in ExecuteTruncateGuts (explicit_rels=explicit_rels@entry=0x5572bb2666a0,
relids=<optimized out>, relids@entry=0x5572bb266700, relids_logged=relids_logged@entry=0x0,
behavior=DROP_RESTRICT, restart_seqs=<optimized out>, stmt=stmt@entry=0x5572bb265fd0) at tablecmds.c:2333
#8 0x00005572b187d11d in ExecuteTruncate (stmt=stmt@entry=0x5572bb265fd0) at tablecmds.c:2116
#9 0x00005572b1ae5c35 in standard_ProcessUtility (pstmt=pstmt@entry=0x5572bb266218,
queryString=queryString@entry=0x7f4c528aa560 "truncate gp_mksort_test_table;",
context=context@entry=PROCESS_UTILITY_QUERY, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
dest=dest@entry=0x5572b27871e0 <spi_printtupDR>, completionTag=0x7ffe64a9e340 "") at utility.c:637
#10 0x00005572b1ae6793 in ProcessUtility (pstmt=pstmt@entry=0x5572bb266218,
queryString=0x7f4c528aa560 "truncate gp_mksort_test_table;", context=PROCESS_UTILITY_QUERY,
params=params@entry=0x0, queryEnv=0x0, dest=dest@entry=0x5572b27871e0 <spi_printtupDR>,
completionTag=0x7ffe64a9e340 "") at utility.c:403
#11 0x00005572b1916707 in _SPI_execute_plan (plan=plan@entry=0x7ffe64a9e3e0, paramLI=paramLI@entry=0x0,
snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=false,
fire_triggers=fire_triggers@entry=true, tcount=0) at spi.c:2554
#12 0x00005572b1916b3e in SPI_execute (src=src@entry=0x7f4c528aa560 "truncate gp_mksort_test_table;",
read_only=<optimized out>, tcount=tcount@entry=0) at spi.c:637
#13 0x00007f4c52ffc05b in PLy_spi_execute_query (query=<optimized out>, limit=0) at plpy_spi.c:353
--Type <RET> for more, q to quit, c to continue without paging--
#14 0x00007f4c52ffc7f8 in PLy_spi_execute (self=<optimized out>, args=0x7f4c52895f10) at plpy_spi.c:180
#15 0x00007f4c52d23687 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#16 0x00007f4c52d23b1b in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#17 0x00007f4c52aefdf3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#18 0x00007f4c52af7ef6 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#19 0x00007f4c52afb06b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#20 0x00007f4c52aefd6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#21 0x00007f4c52af146d in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#22 0x00007f4c52c45e3b in _PyEval_EvalCodeWithName () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#23 0x00007f4c52c461c2 in PyEval_EvalCodeEx () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#24 0x00007f4c52c465af in PyEval_EvalCode () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#25 0x00007f4c52ff5be4 in PLy_procedure_call (proc=proc@entry=0x5572bb267920,
kargs=kargs@entry=0x7f4c530007ad "args", vargs=<optimized out>) at plpy_exec.c:1060
#26 0x00007f4c52ff7591 in PLy_exec_function (fcinfo=fcinfo@entry=0x5572b9d3ca58, proc=0x5572bb267920)
at plpy_exec.c:108
#27 0x00007f4c52ff85ee in plpython3_call_handler (fcinfo=0x5572b9d3ca58) at plpy_main.c:364
#28 0x00005572b18c5603 in ExecInterpExpr (state=0x5572b9d3c970, econtext=0x5572b9d3c660, isnull=0x7ffe64a9ef27)
at execExprInterp.c:639
#29 0x00005572b18c219b in ExecInterpExprStillValid (state=0x5572b9d3c970, econtext=0x5572b9d3c660,
isNull=0x7ffe64a9ef27) at execExprInterp.c:1864
#30 0x00005572b1904dce in ExecEvalExprSwitchContext (isNull=0x7ffe64a9ef27, econtext=0x5572b9d3c660,
state=0x5572b9d3c970) at ../../../src/include/executor/executor.h:362
#31 ExecProject (projInfo=0x5572b9d3c968) at ../../../src/include/executor/executor.h:396
#32 ExecResult (pstate=0x5572b9d3c548) at nodeResult.c:157
#33 0x00005572b18d4023 in ExecProcNodeGPDB (node=node@entry=0x5572b9d3c548) at execProcnode.c:645
#34 0x00005572b18d40d8 in ExecProcNodeFirst (node=0x5572b9d3c548) at execProcnode.c:600
#35 0x00005572b18ca0df in ExecProcNode (node=0x5572b9d3c548) at ../../../src/include/executor/executor.h:271
#36 ExecutePlan (estate=estate@entry=0x5572b9d3c240, planstate=0x5572b9d3c548, use_parallel_mode=<optimized out>,
operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=true, numberTuples=numberTuples@entry=0,
direction=ForwardScanDirection, dest=0x5572b9a72908, execute_once=true) at execMain.c:2672
#37 0x00005572b18cafbf in standard_ExecutorRun (queryDesc=queryDesc@entry=0x5572b9bc3080,
direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=execute_once@entry=true)
at execMain.c:978
--Type <RET> for more, q to quit, c to continue without paging--
#38 0x00005572b18cb22e in ExecutorRun (queryDesc=queryDesc@entry=0x5572b9bc3080,
direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>)
at execMain.c:793
#39 0x00005572b1ae1ce1 in PortalRunSelect (portal=portal@entry=0x5572b9b6e900, forward=forward@entry=true, count=0,
count@entry=9223372036854775807, dest=dest@entry=0x5572b9a72908) at pquery.c:1147
#40 0x00005572b1ae3db9 in PortalRun (portal=portal@entry=0x5572b9b6e900, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x5572b9a72908,
altdest=altdest@entry=0x5572b9a72908, completionTag=0x7ffe64a9f450 "") at pquery.c:985
#41 0x00005572b1addc8f in exec_simple_query (
query_string=query_string@entry=0x5572b9a71550 "select gp_mksort_test();") at postgres.c:1922
#42 0x00005572b1ae0e7a in PostgresMain (argc=<optimized out>, argv=argv@entry=0x5572b9a9e980, dbname=<optimized out>,
username=<optimized out>) at postgres.c:5393
#43 0x00005572b1a39ac7 in BackendRun (port=port@entry=0x5572b9a96680) at postmaster.c:4985
#44 0x00005572b1a3c068 in BackendStartup (port=port@entry=0x5572b9a96680) at postmaster.c:4662
#45 0x00005572b1a3dcf8 in ServerLoop () at postmaster.c:2016
#46 0x00005572b1a3f4c0 in PostmasterMain (argc=5, argv=<optimized out>) at postmaster.c:1631
#47 0x00005572b19508da in main (argc=5, argv=0x5572b9a6b1f0) at main.c:240
It seems that catalog table was damaged under particular concurrency scenario. More investigation is needed.
6X repro:
I tried to repro the issue on 6X on both local env and pipeline, but failed by now.
I merged the test case gp_sort to 6X, and kept running the script for a couple of hours:
num=1
while [ $num -le 1000000 ]
do
echo $num
num=$(( $num + 1 ))
./pg_regress --init-file=init_file --load-extension=gp_inject_fault --schedule=my_schedule |tee ./reg_result.txt;
res=`grep "gp_sort" ./reg_result.txt | grep "FAILED";`;
if [ -n "$res" ]
then
echo "got it!!!";
break;
fi
done
But didn't repro.
I also ran the pipeline with the test case: https://dev.ci.gpdb.pivotal.io/teams/main/pipelines/wayao_v6_trunc_error and all is ok.
We cannot conclude now because I removed some test cases of the script because they don't exist on 6X, and the concurrency scenario is subtle. The fact may be that error doesn't exist on 6X, or specific concurrency condition is changed.
I'd suggest to check the issue on 6X after we got the RCA, at the time we should know more about how to repro.
After many hours, now I am able to reproduce this bug in my local env with the help of GDB.
I use rr
to record the bug and then find RCA.
RCA
Session 1: a query that has sub transactions
- 1st subtransaction truncate the table, finish, and commit the subtransaction, introduce a new tuple in
pg_class
with (xmin=subtrans_xid
,xmax=0
) - 2nd subtransaction yet to truncate the table
Session 2: a query has entryDB which will seqscan pg_class
EntryDB QE is reader gang:
/* reader gang copy local snapshot from writer gang */
if (SharedLocalSnapshotSlot != NULL &&
(distributedTransactionContext == DTX_CONTEXT_QE_READER ||
distributedTransactionContext == DTX_CONTEXT_QE_ENTRY_DB_SINGLETON))
{
readerFillLocalSnapshot(snapshot, distributedTransactionContext);
return snapshot;
}
readerFillLocalSnapshot
--> copyLocalSnapshot(snapshot);
will set
snapshot->curcid = SharedLocalSnapshotSlot->snapshot.curcid;
snapshot->subxcnt = -1;
which means the snapshot of this session don't consider subtransaction. Blame the code, seems very very very old code.
Now session 2 scan the tuple of pg_class
created by session 1 with subtrans_xid
in its xmin
, testing MVCC, session 2 will find subtrans_xid
not in MVCC, and try to set a hint bit to that tuple with xmin invalid
:
snapshotCheckResult =
XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot,
((tuple->t_infomask2 & HEAP_XMIN_DISTRIBUTED_SNAPSHOT_IGNORE) != 0),
&setDistributedSnapshotIgnore);
if (setDistributedSnapshotIgnore)
{
tuple->t_infomask2 |= HEAP_XMIN_DISTRIBUTED_SNAPSHOT_IGNORE;
markDirty(buffer, relation, tuple, /* isXmin */ true);
}
if (snapshotCheckResult == XID_IN_SNAPSHOT)
return false;
else if (snapshotCheckResult == XID_SURELY_COMMITTED || TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
SetHintBits(tuple, buffer, relation, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetRawXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, relation, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
Now things come back to session 1, in the 2nd sub transaction, it going to do a catalog heap update, but the tuple is wrongly set hintbit, lead to the error.
I think this bug is introduced during merging PG9.6 of upstream code https://github.com/greenplum-db/gpdb-postgres-merge/commit/8a7d0701814a4e293efad22091d6f6fb441bbe1c
In GP6 (pg94), it will always scan the shared memory procarray so that subtransaction is considered.
Just for reference and keeping context, before commit https://github.com/greenplum-db/gpdb/commit/b52ca70fdfde78d4de0cd31b3a9cf7855c7a7239 SharedLocalSnapshot used to carry subtransaction information as well. Though it used to go stale in some scenarios as mentioned in commit and hence was removed and instead pointer to Writer's PGPROC was maintained to perform the lookup. Though given now the code is replaced to use snapshot and not procarray the logic is broken.
For more info, after offline discussed with @kainwen (thank your for RCA for this hard issue), it can be sure that I hit the same case on xmax
. Also pasted the related code in HeapTupleSatisfiesMVCC
:
1236 if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
1237 {
1238 if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
1239 {
1240 if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
1241 return true; /* deleted after scan started */
1242 else
1243 return false; /* deleted before scan started */
1244 }
1245
1246 snapshotCheckResult = XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot,
1247 ((tuple->t_infomask2 & HEAP_XMAX_DISTRIBUTED_SNAPSHOT_IGNORE) != 0),
1248 &setDistributedSnapshotIgnore);
1249 if (setDistributedSnapshotIgnore)
1250 {
1251 tuple->t_infomask2 |= HEAP_XMAX_DISTRIBUTED_SNAPSHOT_IGNORE;
1252 markDirty(buffer, relation, tuple, /* isXmin */ false);
1253 }
1254 if (snapshotCheckResult == XID_IN_SNAPSHOT)
1255 return true;
1256
1257 if (!(snapshotCheckResult == XID_SURELY_COMMITTED || TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple))))
1258 {
1259 /* it must have aborted or crashed */
1260 SetHintBits(tuple, buffer, relation, HEAP_XMAX_INVALID,
1261 InvalidTransactionId);
1262 return true;
1263 }
1264
1265 /* xmax transaction committed */
1266 SetHintBits(tuple, buffer, relation, HEAP_XMAX_COMMITTED,
1267 HeapTupleHeaderGetRawXmax(tuple));
1268 }
The behavior is that the non-committed xmax
(current subxact xid) is not in snapshot and TransactionIdDidCommit
returned false
but the actual subxact status is in-progress (line# 1257), in this situation, it was treated as abort or crash for this xmax
so set hint bit as HEAP_XMAX_INVALID
, as a result, leaving a corrupt tuple with a visible "invalid" xmax
in pg_class but actually the xmax
was finally committed. This corrupt record could even get worse after a CHECKPOINT, later query on the table pointed by this record will be failed with ERROR: could not open file "base/21313/17523": No such file or directory
.
I tried out a straightforward workaround:
just wait on in-progress status till the xmax
reaching to the final status: commit or abort.
Refer to the patch here for detail and seeing fixable for this issue, but seemingly that breaks the following Upstream rule:
There should never be a case where XidInMVCCSnapshot says a transaction is
done while TransactionIdIsInProgress says it's still running. The other
way around is quite possible though.
A question around in my mind is: the code logically allowed in-progress status to be returned but force it to be treated as abort (guard this just by a comment), is that still true for GP and how to verify it ?
A very simple local reproduce is using isolation 2:
create table t1(a int);
create table t2(a int);
1: begin;
1: select txid_current();
1: end;
2: begin;
2: select txid_current();
3: begin;
3: savepoint s1;
3: truncate t2;
4: begin;
4: select txid_current();
4: end;
5: begin;
5: select txid_current();
6: select * from t1 join (select oid from pg_class) x(a) on x.a = t1.a;
3: savepoint s2;
3: truncate t2;
1q:
2q:
3q:
4q:
5q:
6q:
A question around in my mind is: the code logically allowed in-progress status to be returned but force it to be treated as abort (guard this just by a comment), is that still true for GP and how to verify it ?
Dig more and have an answer of this question:
as the comments of TransactionIdIsInProgress
indication, lookup clog is the slowest way to know whether the transaction status is in progress or not, so there were other faster ways to check status from PGXACT or PGPROC structure, this is the fact that the old code did consider in-progress status. Later, another faster way which consulting XidInMVCCSnapshot
was introduced in commit: https://github.com/greenplum-db/gpdb-postgres-merge/commit/8a7d0701814a4e293efad22091d6f6fb441bbe1c, replaced consulting TransactionIdIsInProgress
.
That makes sense for better performance consideration so the rule needs to be followed in GP as well.
Besides considering subxact status in snapshot check, one minor additional work may be to ensure consulting snapshot covering all cases which TransactionIdIsInProgress covered, need to introduce some assertion code (i.e. via slow path TransactionLogFetch
?) to capture out-law scenarios.
one minor additional work may be to ensure consulting snapshot covering all cases which TransactionIdIsInProgress covered, need to introduce some assertion code (i.e. via slow path TransactionLogFetch ?) to capture out-law scenarios.
I like the idea of having cross checking via slowpath in assert enabled builds, though just need to makes sure the code introduced to do the same doesn't cause too much pain during later upstream merges. Plus it would be in very hot area of code needs extreme caution to introduce everything only for assert enabled builds only.
Fixed by https://github.com/greenplum-db/gpdb/pull/17327. I also add the case here into that PR.
Reopen this issue since the first fix was reverted, we will have a new solution.