posthog
posthog copied to clipboard
wip: feat: move query performance polling to its own process
Problem
Query performance polling would block if clickhouse was being slow returning performance info
Changes
Move the polling of clickhouse to it's own management process.
Does this work well for both Cloud and self-hosted?
Yes
How did you test this code?
Tested it locally
đ Existing Issues For Review
Your pull request is modifying functions with the following pre-existing issues:
đ File: posthog/clickhouse/client/execute_async.py
| Function | Unhandled Issue |
|---|---|
get_query_status |
QueryNotFoundError: Query 75151637-6fe7-499f-b6d5-01e9cbb8fd27 not found for team 18556 ... Event Count: 1 |
Did you find this useful? React with a đ or đ
đ¸ UI snapshots have been updated
2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
Size Change: 0 B
Total Size: 1.05 MB
âšī¸ View Unchanged
| Filename | Size |
|---|---|
frontend/dist/toolbar.js |
1.05 MB |
đ¸ UI snapshots have been updated
2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
Hmm, in principle this looks good to me, but I can't get it work locally. I also can't get the version on master to work locally â just seeing
query_progress: {active_cpu_time: 0, bytes_read: 0, estimated_rows_total: 0, rows_read: 0, time_elapsed: 0}every time, so it's hard to try this properly. Do you know what's up? đ¤
There's a couple things here. Your queries might run so quickly locally that they disappear before the polling hits them.
To help them show up, you'd need to add this "sleep" command to HogQL and insert a sleep command into the trends_query_builder as here:
https://github.com/PostHog/posthog/pull/22298/commits/f3d42897e5abbc227baa0bbfa11c463d4fc869c9
Even with that, I find that Celery is super unreliable locally, which is also why async queries don't work reliably in dev.
Starting multiple celery instances helps. Our local celery runs in solo mode, so I think it's basically single threaded and prevents this from running at the same time an async query is running. You can allow the running of mutiple celeries instances by editing the run config in pycharm
Even with all this, it's still a little flaky. I find I have to restart celery a few times until it works, or click around and load a bunch of insights until the data shows up.