feat(wip): add optimizer for unique user counts query
hackathon team project with @sakce @andehen
(This is hidden behind a HogQLQueryModifier and can only be accessed through the debug UI for now)
This PR can improve query speed, but saving and reusing intermediate results.
This works by creating a new table to store the results, then when a query runs
- Look for select expressions (can be nested) that can be optimized
- For each select expression
- transform it to a query that can insert the preagg rows, and another query that queries the preagg rows
- find missing contiguous data ranges
- for each missing date range, run the insert query
- Replace the select statement in the original query with the query over the preagg rows
How did you test this code?
Added some tests to make sure that this runs successfully on the default trends query! Plus a few other queries
Changelog: (features only) Is this feature complete?
Wiz Scan Summary
⚠️ Many findings detected
Many findings were detected, but only a subset of the findings are displayed inline due to API constraints. To view all findings inline, please click here.
| Scanner | Findings |
|---|---|
| 22 |
|
| 1 |
|
| - | |
| - | |
| - | |
| - | |
| Total | 22 |
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.
Size Change: 0 B
Total Size: 3.7 MB
ℹ️ View Unchanged
| Filename | Size |
|---|---|
frontend/dist/toolbar.js |
3.7 MB |
Migration SQL Changes
Hey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
products/analytics_platform/backend/migrations/0001_preaggregation_job.py
BEGIN;
--
-- Create model PreaggregationJob
--
CREATE TABLE "analytics_platform_preaggregationjob" ("created_at" timestamp with time zone NOT NULL, "id" uuid NOT NULL PRIMARY KEY, "time_range_start" timestamp with time zone NOT NULL, "time_range_end" timestamp with time zone NOT NULL, "query_hash" varchar(64) NOT NULL, "status" varchar(20) NOT NULL, "computed_at" timestamp with time zone NULL, "updated_at" timestamp with time zone NOT NULL, "error" text NULL, "created_by_id" integer NULL, "team_id" integer NOT NULL, CONSTRAINT "time_range_start_before_end" CHECK ("time_range_start" < ("time_range_end")));
ALTER TABLE "analytics_platform_preaggregationjob" ADD CONSTRAINT "analytics_platform_p_created_by_id_3ee332dd_fk_posthog_u" FOREIGN KEY ("created_by_id") REFERENCES "posthog_user" ("id") DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE "analytics_platform_preaggregationjob" ADD CONSTRAINT "analytics_platform_p_team_id_8584a964_fk_posthog_t" FOREIGN KEY ("team_id") REFERENCES "posthog_team" ("id") DEFERRABLE INITIALLY DEFERRED;
CREATE INDEX "analytics_platform_preaggregationjob_created_by_id_3ee332dd" ON "analytics_platform_preaggregationjob" ("created_by_id");
CREATE INDEX "analytics_platform_preaggregationjob_team_id_8584a964" ON "analytics_platform_preaggregationjob" ("team_id");
CREATE INDEX "analytics_p_team_id_f37a89_idx" ON "analytics_platform_preaggregationjob" ("team_id", "query_hash");
CREATE INDEX "analytics_p_team_id_b17148_idx" ON "analytics_platform_preaggregationjob" ("team_id", "status");
CREATE INDEX "analytics_p_team_id_4e65a5_idx" ON "analytics_platform_preaggregationjob" ("team_id", "time_range_start", "time_range_end");
COMMIT;
Last updated: 2025-12-15 23:43 UTC (50430db)
🔍 Migration Risk Analysis
We've analyzed your migrations for potential risks.
Summary: 1 Safe | 0 Needs Review | 0 Blocked
✅ Safe
Brief or no lock, backwards compatible
analytics_platform.0001_preaggregation_job
└─ #1 ✅ CreateModel
Creating new table is safe
model: PreaggregationJob
│
└──> ℹ️ INFO:
ℹ️ Skipped operations on newly created tables (empty tables
don't cause lock contention).
Last updated: 2025-12-15 23:43 UTC (50430db)
Query snapshots: Backend query snapshots updated
Changes: 2 snapshots (2 modified, 0 added, 0 deleted)
What this means:
- Query snapshots have been automatically updated to match current output
- These changes reflect modifications to database queries or schema
Next steps:
- Review the query changes to ensure they're intentional
- If unexpected, investigate what caused the query to change
Query snapshots: Backend query snapshots updated
Changes: 1 snapshots (1 modified, 0 added, 0 deleted)
What this means:
- Query snapshots have been automatically updated to match current output
- These changes reflect modifications to database queries or schema
Next steps:
- Review the query changes to ensure they're intentional
- If unexpected, investigate what caused the query to change
Query snapshots: Backend query snapshots updated
Changes: 1 snapshots (1 modified, 0 added, 0 deleted)
What this means:
- Query snapshots have been automatically updated to match current output
- These changes reflect modifications to database queries or schema
Next steps:
- Review the query changes to ensure they're intentional
- If unexpected, investigate what caused the query to change