posthog icon indicating copy to clipboard operation
posthog copied to clipboard

feat(data-warehouse): add direct query mode for Postgres data sources

Open andrewm4894 opened this issue 1 month ago â€ĸ 6 comments

Problem

Users want to query external Postgres databases directly from PostHog without syncing all the data first. Currently, all data sources require batch ingestion via Temporal workflows, which means waiting for data to sync before querying.

This is the first PR in a series to enable direct/passthrough querying of external databases. This PR adds the foundation - a new is_direct_query flag that allows creating Postgres connections with schema discovery but no batch ingestion.

Changes

Backend:

  • Add is_direct_query boolean field to ExternalDataSource model
  • Skip billing check for direct query sources (no data stored)
  • Set source status to "Completed" immediately (no sync needed)
  • Force should_sync=False on all schemas
  • Skip Temporal workflow creation for direct query sources
  • Block reload() action for direct query sources
  • Return "Direct query" as status in API responses

Schema Metadata Discovery (for direct query sources):

  • Add get_primary_keys() function to discover primary key columns
  • Add get_foreign_keys() function to discover FK relationships
  • Add get_indexes() function to discover all table indexes
  • Store metadata in ExternalDataSchema.sync_type_config["schema_metadata"]
  • Add helper properties: schema.primary_key, schema.foreign_keys, schema.columns, schema.indexes
  • Metadata includes:
    • Primary keys (single and composite)
    • Foreign keys with target table/column references
    • Column names and data types
    • Index names, columns, uniqueness, and primary status

Frontend:

  • Add "Direct query mode" toggle in Postgres source creation form
  • Skip schema selection step for direct query sources
  • Pass schema metadata (PK/FK/indexes) when creating direct query sources
  • Update success screen messaging:
    • Title: "Data source connected" instead of "Importing your data..."
    • Body: "You can now query this data source directly from PostHog"
    • Status: "Available" instead of "Not synced"

How did you test this code?

  • Manual testing with local Postgres (Pagila demo database)
  • Verified source creation with is_direct_query=True
  • Verified all schemas have should_sync=False
  • Verified no Temporal workflows are created
  • Verified PK/FK/index metadata is correctly discovered and stored
  • Added backend unit tests for direct query source behavior

Changelog: (features only) Is this feature complete?

No - this is PR 1 of 2. PR 2 will add the actual query execution engine to query direct sources.

andrewm4894 avatar Dec 10 '25 13:12 andrewm4894

Size Change: +651 B (+0.02%)

Total Size: 3.66 MB

â„šī¸ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 3.66 MB +651 B (+0.02%)

compressed-size-action

github-actions[bot] avatar Dec 10 '25 13:12 github-actions[bot]

Migration SQL Changes

Hey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:

products/data_warehouse/backend/migrations/0012_add_is_direct_query.py

BEGIN;
--
-- Add field is_direct_query to externaldatasource
--
ALTER TABLE "posthog_externaldatasource" ADD COLUMN "is_direct_query" boolean DEFAULT false NOT NULL;
ALTER TABLE "posthog_externaldatasource" ALTER COLUMN "is_direct_query" DROP DEFAULT;
COMMIT;

Last updated: 2025-12-10 14:06 UTC (df126ef)

github-actions[bot] avatar Dec 10 '25 13:12 github-actions[bot]

🔍 Migration Risk Analysis

We've analyzed your migrations for potential risks.

Summary: 1 Safe | 0 Needs Review | 0 Blocked

✅ Safe

Brief or no lock, backwards compatible

data_warehouse.0012_add_is_direct_query
  └─ #1 ✅ AddField
     Adding NOT NULL field with constant default (safe in PG11+)
     model: externaldatasource, field: is_direct_query

Last updated: 2025-12-10 14:06 UTC (df126ef)

github-actions[bot] avatar Dec 10 '25 13:12 github-actions[bot]

â­ī¸ Skipped snapshot commit because branch advanced to 05d913e while workflow was testing b28577a.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

  • Merge master into your branch, or
  • Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

posthog-bot avatar Dec 10 '25 13:12 posthog-bot

â­ī¸ Skipped snapshot commit because branch advanced to 1d8922a while workflow was testing 05d913e.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

  • Merge master into your branch, or
  • Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

posthog-bot avatar Dec 10 '25 13:12 posthog-bot

Query snapshots: Backend query snapshots updated

Changes: 3 snapshots (3 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

posthog-bot avatar Dec 10 '25 14:12 posthog-bot

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

github-actions[bot] avatar Dec 18 '25 07:12 github-actions[bot]