timescaledb
timescaledb copied to clipboard
[Bug]: Segmentation fault when INNER JOINing two hypertables
What type of bug is this?
Crash, Unexpected error
What subsystems and features are affected?
Query executor
What happened?
I am trying to run a query on a hypertable, while also INNER JOIN
ing it with another hypertable. However, this causes a segmentation fault. The expected behavior is for the query to complete without a segmentation fault.
Changing the query to use a LEFT JOIN
works as expected.
TimescaleDB version affected
2.7.0
PostgreSQL version used
14.3
What operating system did you use?
Ubuntu 20.04.4 x86-64
What installation method did you use?
Deb/Apt
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
2022-06-05 15:34:25.631 UTC [21278] LOG: server process (PID 146466) was terminated by signal 11: Segmentation fault
2022-06-05 15:34:25.631 UTC [21278] DETAIL: Failed process was running: SELECT time_bucket_gapfill('1d', tbl1.timestamp) as time,
count(*) as count
FROM table1 tbl1
INNER JOIN table2 tbl2
ON tbl1.tbl2_hash = tbl2.hash
AND tbl1.timestamp = tbl2.timestamp
WHERE tbl1.timestamp BETWEEN '2022-01-25T21:42:21.182Z' AND '2022-06-01T21:42:21.182Z'
GROUP BY "time"
ORDER BY "time"
2022-06-05 15:34:25.631 UTC [21278] LOG: terminating any other active server processes
2022-06-05 15:34:25.831 UTC [21278] LOG: all server processes terminated; reinitializing
2022-06-05 15:34:26.209 UTC [146469] LOG: database system was interrupted; last known up at 2022-06-05 15:34:11 UTC
2022-06-05 15:34:26.209 UTC [146470] [unknown]@[unknown] LOG: PID 146259 in cancel request did not match any process
2022-06-05 15:34:26.209 UTC [146471] [unknown]@[unknown] LOG: PID 146261 in cancel request did not match any process
2022-06-05 15:34:26.259 UTC [146469] LOG: database system was not properly shut down; automatic recovery in progress
2022-06-05 15:34:26.261 UTC [146469] LOG: redo starts at 9285/BF1C36D8
2022-06-05 15:34:26.833 UTC [146472] [unknown]@[unknown] LOG: PID 146260 in cancel request did not match any process
2022-06-05 15:34:26.833 UTC [146473] [unknown]@[unknown] LOG: PID 146262 in cancel request did not match any process
2022-06-05 15:34:26.834 UTC [146474] [unknown]@[unknown] LOG: PID 146263 in cancel request did not match any process
2022-06-05 15:34:27.483 UTC [146469] LOG: redo done at 9285/F3CA1028 system usage: CPU: user: 0.49 s, system: 0.72 s, elapsed: 1.22 s
How can we reproduce the bug?
It is not entirely clear to me the exact steps to reproduce, but I've reproduced this on multiple instances with somewhat similar steps:
1. Create two hypertables
2. Populate each hypertable with 1,000,000+ rows
a) Each row in table1 has a reference to a row in table2
b) Table2 contains more rows than table1
3. Run query to INNER JOIN table1 and table2
@boxhock can you please share the stack trace of the crashing backend? You can use the below to get a stack trace on ubuntu:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#On_Ubuntu
Hey @boxhock ... did u had a change to get a stack trace as Nikhil asked before?
hey @boxhock any chance to provide a stack trace for us? Also, can you please share with us the exact schemes of the tables you used for reproducing the issue and maybe some sample data? These would help us to be able to constantly reproduce the issue and find the root cause. thank you.
This issue has been automatically marked as stale due to lack of activity. You can remove the stale label or comment. Otherwise, this issue will be closed in 30 days. Thank you!
I have the same problem on PG 14.5 with TSDB 2.8.0, getting a segfault when using time_bucket_gapfill
with an INNER JOIN
between two hypertables. Considering going back to v1.7.5 😞
I'm running Postgres in Docker, and I've no clue how to get a stack trace out of it - if someone can advise, I'd be happy to try?
Hello @cocowalla,
Thank you very much for reaching out. Creating stack traces in a Docker container can be laborious. Do you have a self-contained example that reproduces the crash that you can share? I would then try to reproduce the crash on my system and generate a stack trace.
@jnidzwetzki I was literally just putting something together, and I saw a notification that it's fixed!
What's the easiest way for me to test it before it hits v2.8.1 please?
Easiest way is the nightly docker image: timescaledev/timescaledb:nightly-pg14
. You might have to wait a day till tomorrow till it picks up the bugfix.
Easiest way is the nightly docker image:
timescaledev/timescaledb:nightly-pg14
. You might have to wait a day till tomorrow till it picks up the bugfix.
The latest timescaledev build is 20 days old (seems odd for a Nightly tag 😕!), so I built it myself instead. Now, the segfault is fixed, which is great!
But, this issue only seems to be partially resolved - if I do an INNER JOIN
between tables, or I do a LEFT JOIN
and include the joined table in the WHERE
clause, I get:
ORDER/GROUP BY expression not found in targetlist
I'll prepare a repro and create a new issue.
@jnidzwetzki I just opened issue #4774 in relation to this