hackathon(bi): Add direct SQL query support for external database
MetaHog - Direct Database Connection BI Layer
Enable users to connect to external databases (starting with Postgres) and run SQL queries directly against them from the PostHog SQL editor, without requiring batch import to ClickHouse.
Documentation
- ARCHITECTURE.md - System overview, data flow diagrams, component details
- PLAN.md - Implementation phases and decisions
- RESEARCH.md - Codebase exploration notes
- IMPLEMENTATION.md - Detailed implementation status
Features
- [x] Add Postgres connection via UI with "Query only" checkbox
- [x] Select connected database as query target in SQL editor
- [x] Execute SQL queries directly against connected database
- [x] See schema (tables/columns) for connected databases with β‘ indicator
- [x] Results display in same format as HogQL queries
- [x] DirectQuery integrated into unified
/query/API
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SQL Editor UI β
β βββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
β β Database Select β β Monaco Editor β β
β β - PostHog (HogQL)β β SELECT * FROM postgres.film β β
β β - Pagila (PG) β‘β β LIMIT 10; β β
β βββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββ΄ββββββββββββββββ
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β ClickHouse β β External β
β (PostHog) β β Postgres β
ββββββββββββββββ ββββββββββββββββ
Changes
Backend:
-
ExternalDataSource.query_onlyfield - skip data sync for query-only sources -
DirectQueryExecutorservice - execute SQL against external Postgres -
DirectQuerykind in unified/query/API -
/direct_query/sourcesand/direct_query/schema/:idendpoints
Frontend:
-
DatabaseSelectorcomponent - switch between HogQL and external databases -
directQueryLogic.ts- Kea logic for direct query state - Schema browser integration with β‘ icon for direct query tables
- Automatic table prefix stripping (
postgres.filmβfilm)
Security
- β
Query timeout protection (30s default) via
SET statement_timeout - β
Read-only mode enforced via
SET default_transaction_read_only = ON - β Error message sanitization (no credential leaks in responses)
- β Structured logging with structlog for audit trail
- β Team isolation enforced on all endpoints
Testing
16 unit tests covering:
- API validation (source_id, sql required)
- Team isolation and access control
- Query execution happy path and error handling
- Schema retrieval
- Connection parameter extraction
- Timeout and read-only enforcement verification
- Error message sanitization
Run: pytest products/data_warehouse/backend/api/test/test_direct_query.py -v
Manual Testing
- Start demo Pagila database:
cd ~/Documents/GitHub/pagila && docker compose up -d - Add Postgres source with "Query only" checkbox
- Select database from dropdown
- Run:
SELECT * FROM postgres.film LIMIT 10
π€ Generated with Claude Code
Size Change: +668 B (+0.02%)
Total Size: 3.51 MB
βΉοΈ View Unchanged
| Filename | Size | Change |
|---|---|---|
frontend/dist/toolbar.js |
3.51 MB | +668 B (+0.02%) |
Migration SQL Changes
Hey π, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
products/data_warehouse/backend/migrations/0012_externaldatasource_query_only.py
BEGIN;
--
-- Add field query_only to externaldatasource
--
ALTER TABLE "posthog_externaldatasource" ADD COLUMN "query_only" boolean DEFAULT false NOT NULL;
ALTER TABLE "posthog_externaldatasource" ALTER COLUMN "query_only" DROP DEFAULT;
COMMIT;
products/data_warehouse/backend/migrations/0013_add_is_direct_query_to_table.py
BEGIN;
--
-- Add field is_direct_query to datawarehousetable
--
ALTER TABLE "posthog_datawarehousetable" ADD COLUMN "is_direct_query" boolean DEFAULT false NOT NULL;
ALTER TABLE "posthog_datawarehousetable" ALTER COLUMN "is_direct_query" DROP DEFAULT;
--
-- Alter field format on datawarehousetable
--
-- (no-op)
--
-- Alter field url_pattern on datawarehousetable
--
-- (no-op)
COMMIT;
Last updated: 2025-12-09 21:23 UTC (be3b27c)
π Migration Risk Analysis
We've analyzed your migrations for potential risks.
Summary: 1 Safe | 1 Needs Review | 0 Blocked
β οΈ Needs Review
May have performance impact
data_warehouse.0013_add_is_direct_query_to_table
ββ #1 β
AddField
Adding NOT NULL field with constant default (safe in PG11+)
model: datawarehousetable, field: is_direct_query
ββ #2 β οΈ AlterField
Field alteration may cause table locks or data loss (check if changing type or constraints)
model: datawarehousetable, field: format, field_type: CharField
ββ #3 β οΈ AlterField
Field alteration may cause table locks or data loss (check if changing type or constraints)
model: datawarehousetable, field: url_pattern, field_type: CharField
β Safe
Brief or no lock, backwards compatible
data_warehouse.0012_externaldatasource_query_only
ββ #1 β
AddField
Adding NOT NULL field with constant default (safe in PG11+)
model: externaldatasource, field: query_only
Last updated: 2025-12-09 21:23 UTC (be3b27c)
Query snapshots: Backend query snapshots updated
Changes: 3 snapshots (3 modified, 0 added, 0 deleted)
What this means:
- Query snapshots have been automatically updated to match current output
- These changes reflect modifications to database queries or schema
Next steps:
- Review the query changes to ensure they're intentional
- If unexpected, investigate what caused the query to change
Wiz Scan Summary
| Scanner | Findings |
|---|---|
| - | |
| 1 |
|
| - | |
| - | |
| - | |
| Total | 1 |
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.
βοΈ Skipped snapshot commit because branch advanced to bcdaaa6 while workflow was testing 3a40c78.
The new commit will trigger its own snapshot update workflow.
If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
- Merge master into your branch, or
- Push an empty commit:
git commit --allow-empty -m 'trigger CI' && git push
Query snapshots: Backend query snapshots updated
Changes: 2 snapshots (2 modified, 0 added, 0 deleted)
What this means:
- Query snapshots have been automatically updated to match current output
- These changes reflect modifications to database queries or schema
Next steps:
- Review the query changes to ensure they're intentional
- If unexpected, investigate what caused the query to change
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label β otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.