feat(data-warehouse): add direct query mode for Postgres data sources
Problem
Users want to query external Postgres databases directly from PostHog without syncing all the data first. Currently, all data sources require batch ingestion via Temporal workflows, which means waiting for data to sync before querying.
This is the first PR in a series to enable direct/passthrough querying of external databases. This PR adds the foundation - a new is_direct_query flag that allows creating Postgres connections with schema discovery but no batch ingestion.
Changes
Backend:
- Add
is_direct_queryboolean field toExternalDataSourcemodel - Skip billing check for direct query sources (no data stored)
- Set source status to "Completed" immediately (no sync needed)
- Force
should_sync=Falseon all schemas - Skip Temporal workflow creation for direct query sources
- Block
reload()action for direct query sources - Return "Direct query" as status in API responses
Schema Metadata Discovery (for direct query sources):
- Add
get_primary_keys()function to discover primary key columns - Add
get_foreign_keys()function to discover FK relationships - Add
get_indexes()function to discover all table indexes - Store metadata in
ExternalDataSchema.sync_type_config["schema_metadata"] - Add helper properties:
schema.primary_key,schema.foreign_keys,schema.columns,schema.indexes - Metadata includes:
- Primary keys (single and composite)
- Foreign keys with target table/column references
- Column names and data types
- Index names, columns, uniqueness, and primary status
Frontend:
- Add "Direct query mode" toggle in Postgres source creation form
- Skip schema selection step for direct query sources
- Pass schema metadata (PK/FK/indexes) when creating direct query sources
- Update success screen messaging:
- Title: "Data source connected" instead of "Importing your data..."
- Body: "You can now query this data source directly from PostHog"
- Status: "Available" instead of "Not synced"
How did you test this code?
- Manual testing with local Postgres (Pagila demo database)
- Verified source creation with
is_direct_query=True - Verified all schemas have
should_sync=False - Verified no Temporal workflows are created
- Verified PK/FK/index metadata is correctly discovered and stored
- Added backend unit tests for direct query source behavior
Changelog: (features only) Is this feature complete?
No - this is PR 1 of 2. PR 2 will add the actual query execution engine to query direct sources.
Size Change: +651 B (+0.02%)
Total Size: 3.66 MB
âšī¸ View Unchanged
| Filename | Size | Change |
|---|---|---|
frontend/dist/toolbar.js |
3.66 MB | +651 B (+0.02%) |
Migration SQL Changes
Hey đ, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
products/data_warehouse/backend/migrations/0012_add_is_direct_query.py
BEGIN;
--
-- Add field is_direct_query to externaldatasource
--
ALTER TABLE "posthog_externaldatasource" ADD COLUMN "is_direct_query" boolean DEFAULT false NOT NULL;
ALTER TABLE "posthog_externaldatasource" ALTER COLUMN "is_direct_query" DROP DEFAULT;
COMMIT;
Last updated: 2025-12-10 14:06 UTC (df126ef)
đ Migration Risk Analysis
We've analyzed your migrations for potential risks.
Summary: 1 Safe | 0 Needs Review | 0 Blocked
â Safe
Brief or no lock, backwards compatible
data_warehouse.0012_add_is_direct_query
ââ #1 â
AddField
Adding NOT NULL field with constant default (safe in PG11+)
model: externaldatasource, field: is_direct_query
Last updated: 2025-12-10 14:06 UTC (df126ef)
âī¸ Skipped snapshot commit because branch advanced to 05d913e while workflow was testing b28577a.
The new commit will trigger its own snapshot update workflow.
If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
- Merge master into your branch, or
- Push an empty commit:
git commit --allow-empty -m 'trigger CI' && git push
âī¸ Skipped snapshot commit because branch advanced to 1d8922a while workflow was testing 05d913e.
The new commit will trigger its own snapshot update workflow.
If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
- Merge master into your branch, or
- Push an empty commit:
git commit --allow-empty -m 'trigger CI' && git push
Query snapshots: Backend query snapshots updated
Changes: 3 snapshots (3 modified, 0 added, 0 deleted)
What this means:
- Query snapshots have been automatically updated to match current output
- These changes reflect modifications to database queries or schema
Next steps:
- Review the query changes to ensure they're intentional
- If unexpected, investigate what caused the query to change
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label â otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.