posthog icon indicating copy to clipboard operation
posthog copied to clipboard

feat(insights): Make all queries cacheable

Open Twixes opened this issue 1 year ago • 9 comments

Problem

Currently HogQLQuery and EventsQuery aren't cached in any way, which is pretty bad in terms of loading dashboards. If there is a specific reason for this, absolutely let me know! But my perspective currently is that it's overzealous to calculate new results every single time. We already have _is_stale() for determining staleness.

Changes

All query runner-based queries now use the query runner caching layer.

Twixes avatar Apr 29 '24 21:04 Twixes

@mariusandra I see you've re-ran the backend tests, but they aren't doing that badly, they've caught some actual issues. 😅 The big one is that HogQLQueryResponse doesn't conform to CachedQueryResponse, because the former has some extra unexpected fields, and CachedQueryResponse isn't prepared to deal with this.

I would prefer if CachedQueryResponse only had a field like child: HogQLQueryResponse | TrendsQueryResponse | etc. which would allow us to deal better with variability, but we'll have to work around the current schema somehow. Possibly just adding five more fields to CachedQueryResponse, which would unfortunately be unused for almost all other queries

Twixes avatar Apr 30 '24 07:04 Twixes

Hmm, whoops, my bad. Saw a lot of failures, yet my bad connection in the last cafe didn't let me look deeper. I could at least load the "run again" button, so that's what I clicked. The first test I clicked on did have a 2min runtime before flaking as well.

Regarding the five extra fields, what about just one complex object with a key like cacheControl or such? Nothing really against the extra fields, as we do already throw a lot of metadata and custom fields into all DataNode-s (e.g. hogql in the output, modifiers in the input, etc)

mariusandra avatar Apr 30 '24 07:04 mariusandra

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar May 02 '24 21:05 posthog-bot

Size Change: 0 B

Total Size: 1.05 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.05 MB

compressed-size-action

github-actions[bot] avatar May 02 '24 21:05 github-actions[bot]

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar May 02 '24 21:05 posthog-bot

Unfortunately this seemingly simple quest required shaving a yak, and this is a pretty big yak as far as yaks go. They don't actually go very far, as they weight between 200 and 700 kg… Anyway:

Problem

Our Pydantic models for the fresh versions of query responses are great. Each query type has its own shape of a response. When something is an EventsQueryResponse, you can sleep well knowing it has the right attributes matching the expected types.

But not cached responses! For these we only had CachedQueryResponse. It was clearly designed for classic insight queries, whose responses are pretty barebones. But let's say we want to cache an SQL insight – its response has a few extra, SQL-specific response fields.

Now, we could turn CachedQueryResponse into a monster with lots of optional fields, but then all hell can break loose with what's inside…

Changes

…or we can ensure that every *QueryResponse has its well-formed Cached*QueryResponse equivalent. Thanks to the power of TypeScript generics and extends, we can.

Twixes avatar May 02 '24 21:05 Twixes