timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug]: Chunks not excluded when JOINing with materialized CTE

Open amCap1712 opened this issue 3 years ago • 3 comments

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query executor, Query planner

What happened?

I have a CTE that reads a value from a table and then the next part of the query uses this value as a lower bound is to scan a hypertable. I have tried playing with materialized/not materialized in the CTE but that doesn't seem to work. The query and the plan are available here: https://explain.dalibo.com/plan/q7W#query. (The conditions added to the outer query and the LIMIT 1 are vacuous because those will always be true, I only added them due to some suggestions in forums that it may help the query planner. But it didn't help.)

However, if I try a slightly different query and instead of JOINing with the CTE, select from it using a subquery. The query plan improves. I can see in it there is runtime chunk exclusion. Since the query will only touch 1 or 2 latest chunks, this is a great performance improvement. https://explain.dalibo.com/plan/Vo6#query

TimescaleDB version affected

2.6.0

PostgreSQL version used

13.6

What operating system did you use?

Official Timescale Docker Alpine Image running on Ubuntu 20.04 host

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

No response

amCap1712 avatar Mar 15 '22 11:03 amCap1712

I've also seen that the postgrseql/TDB planner is confused when a hypertable is JOINed with another table. I've also tested with a regular PostgreSQL table, and did not see the same problem there.

I have a hypertable, and a very small side table (e.g. even when it contains a single row containing just a device_id & device_name); and created a view over these so that I can have device_name when I query the view.

The hypertable is indexed by a timestamp column.

I've seen with a normal select over the view, that Postgres uses nested loop when joining which is so slower than a hash join. When joining a regular postgres table, it correctly used a hash join but with a hypertable it used nested loop. I've been able to work around this problem with disabling nested loop for the problematic query.

However, recently I've got a big problem with a simple query like:

SELECT * FROM my_joined_view ORDER BY ts_column DESC limit 1;

The above query does a sequential scan over all partitions, sorts them (despite the fact that we have indexes over ts_column), and finally returns a single row.

However, if I run the above query over the hypertable itself, I see it correctly uses the ts index, retrives only a single row from each partition and finally returns a single row, which is much faster.

hedayat avatar Jul 12 '22 12:07 hedayat

Could you post a self-contained script to reproduce the situation including the view definition.

svenklemm avatar Jul 23 '22 13:07 svenklemm

Looks like its not easy to reproduce; it probably needs a really big db. I'm still trying to reproduce the problem with an script producing random data.

hedayat avatar Aug 10 '22 16:08 hedayat

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days. Thank you!

github-actions[bot] avatar Oct 10 '22 02:10 github-actions[bot]

Dear Author,

We are closing this issue due to lack of activity. Feel free to add a comment to this issue if you can provide more information and we will re-open it. Thank you!

github-actions[bot] avatar Nov 10 '22 02:11 github-actions[bot]