posthog icon indicating copy to clipboard operation
posthog copied to clipboard

feat(wip): add optimizer for unique user counts query

Open robbie-c opened this issue 1 month ago • 4 comments

hackathon team project with @sakce @andehen

(This is hidden behind a HogQLQueryModifier and can only be accessed through the debug UI for now)

This PR can improve query speed, but saving and reusing intermediate results.

This works by creating a new table to store the results, then when a query runs

  • Look for select expressions (can be nested) that can be optimized
  • For each select expression
    • transform it to a query that can insert the preagg rows, and another query that queries the preagg rows
    • find missing contiguous data ranges
    • for each missing date range, run the insert query
    • Replace the select statement in the original query with the query over the preagg rows

How did you test this code?

Added some tests to make sure that this runs successfully on the default trends query! Plus a few other queries

Changelog: (features only) Is this feature complete?

robbie-c avatar Dec 10 '25 13:12 robbie-c

Wiz Scan Summary

⚠️ Many findings detected
Many findings were detected, but only a subset of the findings are displayed inline due to API constraints. To view all findings inline, please click here.
Scanner Findings
Vulnerability Finding Vulnerabilities 22 High 22 Medium 13 Low
Data Finding Sensitive Data 1 Info
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings -
Software Supply Chain Finding Software Supply Chain Findings -
Total 22 High 22 Medium 13 Low 1 Info

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

wiz-7ad640923b[bot] avatar Dec 10 '25 13:12 wiz-7ad640923b[bot]

Size Change: 0 B

Total Size: 3.7 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 3.7 MB

compressed-size-action

github-actions[bot] avatar Dec 10 '25 14:12 github-actions[bot]

Migration SQL Changes

Hey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:

products/analytics_platform/backend/migrations/0001_preaggregation_job.py

BEGIN;
--
-- Create model PreaggregationJob
--
CREATE TABLE "analytics_platform_preaggregationjob" ("created_at" timestamp with time zone NOT NULL, "id" uuid NOT NULL PRIMARY KEY, "time_range_start" timestamp with time zone NOT NULL, "time_range_end" timestamp with time zone NOT NULL, "query_hash" varchar(64) NOT NULL, "status" varchar(20) NOT NULL, "computed_at" timestamp with time zone NULL, "updated_at" timestamp with time zone NOT NULL, "error" text NULL, "created_by_id" integer NULL, "team_id" integer NOT NULL, CONSTRAINT "time_range_start_before_end" CHECK ("time_range_start" < ("time_range_end")));
ALTER TABLE "analytics_platform_preaggregationjob" ADD CONSTRAINT "analytics_platform_p_created_by_id_3ee332dd_fk_posthog_u" FOREIGN KEY ("created_by_id") REFERENCES "posthog_user" ("id") DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE "analytics_platform_preaggregationjob" ADD CONSTRAINT "analytics_platform_p_team_id_8584a964_fk_posthog_t" FOREIGN KEY ("team_id") REFERENCES "posthog_team" ("id") DEFERRABLE INITIALLY DEFERRED;
CREATE INDEX "analytics_platform_preaggregationjob_created_by_id_3ee332dd" ON "analytics_platform_preaggregationjob" ("created_by_id");
CREATE INDEX "analytics_platform_preaggregationjob_team_id_8584a964" ON "analytics_platform_preaggregationjob" ("team_id");
CREATE INDEX "analytics_p_team_id_f37a89_idx" ON "analytics_platform_preaggregationjob" ("team_id", "query_hash");
CREATE INDEX "analytics_p_team_id_b17148_idx" ON "analytics_platform_preaggregationjob" ("team_id", "status");
CREATE INDEX "analytics_p_team_id_4e65a5_idx" ON "analytics_platform_preaggregationjob" ("team_id", "time_range_start", "time_range_end");
COMMIT;

Last updated: 2025-12-15 23:43 UTC (50430db)

github-actions[bot] avatar Dec 10 '25 14:12 github-actions[bot]

🔍 Migration Risk Analysis

We've analyzed your migrations for potential risks.

Summary: 1 Safe | 0 Needs Review | 0 Blocked

✅ Safe

Brief or no lock, backwards compatible

analytics_platform.0001_preaggregation_job
  └─ #1 ✅ CreateModel
     Creating new table is safe
     model: PreaggregationJob
  │
  └──> ℹ️  INFO:
       ℹ️  Skipped operations on newly created tables (empty tables
       don't cause lock contention).

Last updated: 2025-12-15 23:43 UTC (50430db)

github-actions[bot] avatar Dec 10 '25 14:12 github-actions[bot]

Query snapshots: Backend query snapshots updated

Changes: 2 snapshots (2 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

posthog-bot avatar Dec 10 '25 15:12 posthog-bot

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

posthog-bot avatar Dec 11 '25 14:12 posthog-bot

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

posthog-bot avatar Dec 15 '25 23:12 posthog-bot