timescaledb
timescaledb copied to clipboard
[Bug]: INSERT INTO table [...] WHERE NOT EXISTS (SELECT 1 FROM table WHERE some_condition IS TRUE) casuses segfault
What type of bug is this?
Crash
What subsystems and features are affected?
Query executor
What happened?
INSERT INTO table [...] WHERE NOT EXISTS (SELECT 1 FROM table WHERE some_condition IS TRUE)
causes instant segfault for a partition of certain size (likely fully occupied). The affected table is chunked on scalar BIGSRIAL
with chunk_time_interval => 25000000
TimescaleDB version affected
2.13.1
PostgreSQL version used
15.5
What operating system did you use?
CentOS 7.9.2009
What installation method did you use?
RPM
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
#0 pg_detoast_datum_packed (datum=datum@entry=0xfffffffffffffffd) at fmgr.c:1742
#1 0x00000000008daef2 in text_to_cstring (t=0xfffffffffffffffd) at varlena.c:224
#2 0x000000000090b890 in FunctionCall1Coll (flinfo=0x7ffcfc56ee30, collation=<optimized out>, arg1=<optimized out>) at fmgr.c:1124
#3 0x000000000090c8b6 in OutputFunctionCall (val=18446744073709551613, flinfo=0x7ffcfc56ee30) at fmgr.c:1561
#4 OidOutputFunctionCall (functionId=<optimized out>, val=18446744073709551613) at fmgr.c:1644
#5 0x000000000065c97c in ExecBuildSlotValueDescription (reloid=26937, slot=slot@entry=0x2f05a58, tupdesc=tupdesc@entry=0x7fbc57f04738, modifiedCols=0x2f05e78, maxfieldlen=64) at execMain.c:2294
#6 0x000000000065e83d in ExecConstraints (resultRelInfo=resultRelInfo@entry=0x2ea5728, slot=slot@entry=0x2f05a58, estate=estate@entry=0x2e46878) at execMain.c:1962
#7 0x00007fbc587221f8 in ExecInsert (context=context@entry=0x7ffcfc56f170, resultRelInfo=0x2ea5728, slot=0x2f05a58, canSetTag=<optimized out>)
at /home/builder/rpmbuild/BUILD/timescaledb/src/nodes/hypertable_modify.c:1722
#8 0x00007fbc58722da5 in ExecModifyTable (pstate=0x2e47c08, cs_node=0x2e477e8) at /home/builder/rpmbuild/BUILD/timescaledb/src/nodes/hypertable_modify.c:1021
#9 hypertable_modify_exec (node=0x2e477e8) at /home/builder/rpmbuild/BUILD/timescaledb/src/nodes/hypertable_modify.c:182
#10 0x000000000065cb52 in ExecProcNode (node=0x2e477e8) at ../../../src/include/executor/executor.h:262
#11 ExecutePlan (execute_once=<optimized out>, dest=0x7fbccc7f6540, direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_INSERT, use_parallel_mode=<optimized out>,
planstate=0x2e477e8, estate=0x2e46878) at execMain.c:1636
#12 standard_ExecutorRun (queryDesc=0x2e48888, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#13 0x00007fbcbd75467d in pgss_ExecutorRun (queryDesc=0x2e48888, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:1010
#14 0x00000000007d1b0a in ProcessQuery (plan=<optimized out>,
sourceText=0x2cc62e8 "INSERT INTO table (column1, column2, column3, column4, column5, column6, column7, column8, column9, column10, column11, column12, column13, column14, column15, column16"..., params=0x0, queryEnv=0x0, dest=0x7fbccc7f6540, qc=0x7ffcfc56f5e0) at pquery.c:160
#15 0x00000000007d2595 in PortalRunMulti (portal=portal@entry=0x2d4ac08, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x7fbccc7f6540,
altdest=altdest@entry=0x7fbccc7f6540, qc=qc@entry=0x7ffcfc56f5e0) at pquery.c:1277
#16 0x00000000007d291c in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x7fbccc7f6540, altdest=0x7fbccc7f6540,
qc=0x7ffcfc56f5e0) at pquery.c:791
#17 0x00000000007ce877 in exec_simple_query (
query_string=0x2cc62e8 "INSERT INTO table (column1, column2, column3, column4, column5, column6, column7, column8, column9, column10, column11, column12, column13, column14, column15, column16"...) at postgres.c:1250
#18 0x00000000007cee78 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4598
#19 0x0000000000493a07 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4514
#20 BackendStartup (port=0x2ced7b0) at postmaster.c:4242
#21 ServerLoop () at postmaster.c:1809
#22 0x000000000074cc0d in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2cc0d10) at postmaster.c:1481
#23 0x0000000000494876 in main (argc=3, argv=0x2cc0d10) at main.c:202
How can we reproduce the bug?
We can replicate the segfault on two separate servers with identical configuration every time, however, the crash is dependent on partition occupancy. It does not occur if a partition is not fully occupied. We haven't examined the exact occupancy threshold needed to trigger the segfault.
@kimec thanks for reaching out, can u send a complete reproducible SQL script including the complete INSERT statement???
Hello @fabriziomello,
The DDL for the table is
CREATE TABLE public.test_table
(
id bigserial NOT NULL,
id2 int8 NOT NULL,
id3 text NOT NULL,
column1 text NOT NULL,
column2 text NOT NULL,
column3 numeric NOT NULL,
column4 timestamp NOT NULL,
column5 int8 NOT NULL,
column6 int8 NOT NULL DEFAULT 0,
column7 text NULL,
column8 text NULL,
column9 bool NOT NULL DEFAULT FALSE,
column10 text NOT NULL DEFAULT '-'::text,
column11 text NOT NULL DEFAULT '-'::text,
column12 text NOT NULL DEFAULT '-'::text,
column13 bool NOT NULL DEFAULT FALSE,
column14 timestamp NOT NULL DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP),
column15 numeric NULL,
column16 numeric NULL,
column17 text NULL,
column18 text NULL,
column19 text NULL,
column20 text NULL,
column21 text NULL,
column22 numeric NULL,
column23 numeric NULL,
column24 int4 NULL,
column25 int4 NULL,
column26 text NULL,
column27 bool NULL,
column28 bool NULL,
CONSTRAINT test_table_pkey PRIMARY KEY (id)
);
CREATE INDEX idx_column17 ON public.test_table USING btree (column17);
CREATE INDEX test_table_history_idx ON public.test_table USING btree (column5, id DESC) WHERE (NOT ((column3 = (0)::numeric) OR (column1 = 'XXXXX'::text)));
CREATE INDEX test_table_id2_idx ON public.test_table USING btree (id2);
CREATE INDEX test_table_column5 ON public.test_table USING btree (column5);
CREATE INDEX test_table_id3 ON public.test_table USING btree (id3);
SELECT create_hypertable('test_table', 'id', chunk_time_interval => 25000000);
INSERT
statement example is
INSERT INTO test_table (id2, id3, column1, column2, column21, column3, column4, column5, column17, column6,
column10, column11, column12, column15, column16, column19, column20, column22, column23,
column24, column25, column18, column7, column27, column28)
SELECT
123,
'c7aa52f76c522c5137b1927348bb3271',
'SENTINEL1',
'aa',
'aa',
0,
'2024-01-17 10:24:02',
15294609,
'SENTINEL2',
-3,
'bb',
'cc',
'dd',
0.0,
0.0,
'ee',
'ee',
0,
0,
0,
0,
NULL,
NULL,
'FALSE',
'TRUE'
WHERE NOT EXISTS (SELECT 1 FROM test_table WHERE column17 = 'SENTINEL2' AND column1 = 'SENTINEL1' AND id >= 1);
I will try to come up with some generator script next week. We did our testing by dumping data from the affected system and restoring it on a separate test system. Unfortunately, the dumped dataset is quite large and contains sensitive data.
@kimec did some tests with your schema and I was not able to reproduce the problem. One potential workaround can be create an unique index and use INSERT ON CONFLICT DO NOTHING.
But anyway would be nice you can end up with a reproducible test case so we can provide a fix.
We tried to create a reproducer but were not able to trigger the segfault on artificially generated data. I should note, the dataset that causes the segfault was migrated from Postgres 10 Timescale 1.6. That being said, we migrated the dataset incrementally upgrading both Postgres and Timescale in steps according the compatibility charts in TS documentation. We will be able to drop the migrated dataset fully in a month or two by which time all our live partitions would be created by Pg 15 and TS 2.13 exclusively. I will get back to you then whether the issue still persists on new partitions. If not, I will close the issue. Since this issue may be related to our specific upgrade path, please treat it with a lower priority.