prql icon indicating copy to clipboard operation
prql copied to clipboard

Invalid DISTINCT ON clause generated for postgresql

Open fredericp opened this issue 2 months ago • 4 comments

What happened?

prql version 0.13.7 generates invalid DISTINCT ON clause for postgresql. In this case it outputs: DISTINCT ON (id, name) that should be DISTINCT ON (id, name) *.

kudos for this great project!

PRQL input

from t1
group {t1.id, t1.name} (take 1)
aggregate {c=count this}

SQL output

WITH table_0 AS (
  SELECT
    DISTINCT ON (id, name)
  FROM
    t1
)
SELECT
  COUNT(*) AS c
FROM
  table_0

Expected SQL output

WITH table_0 AS (
  SELECT
    DISTINCT ON (id, name) *
  FROM
    t1
)
SELECT
  COUNT(*) AS c
FROM
  table_0

MVCE confirmation

  • [x] Minimal example
  • [x] New issue

Anything else?

No response

fredericp avatar Nov 11 '25 14:11 fredericp

@claude can you try to fix this? or is it more involved?

max-sixty avatar Nov 17 '25 19:11 max-sixty

Claude finished @max-sixty's task —— View jobclaude/issue-5556-20251117-1904Create PR ➔


Fixed the DISTINCT ON bug! ✅

Task List:

  • [x] Understand the codebase structure and locate SQL generation code
  • [x] Find where DISTINCT ON clauses are generated for PostgreSQL
  • [x] Identify the bug causing the missing *
  • [x] Implement the fix
  • [x] Add/update tests
  • [x] Run test suite (task prqlc:test then task test-all)
  • [x] Commit and push changes
  • [x] Provide PR link

What was the bug?

When DISTINCT ON was used with an empty projection (e.g., when followed by an aggregate), PRQL generated invalid SQL like:

SELECT DISTINCT ON (id, name)
FROM t1

Instead of the correct:

SELECT DISTINCT ON (id, name) *
FROM t1

The fix

Modified prqlc/prqlc/src/sql/gen_query.rs to detect when DISTINCT ON is present and ensure the projection always includes at least a wildcard (*) instead of a NULL placeholder.

Changes made

  1. Core fix: Added logic in translate_select_pipeline() to replace NULL with * when DISTINCT ON is used
  2. New test: Added distinct_on_with_aggregate.prql to test the specific bug case
  3. Updated existing test: Fixed test_distinct_on_03 to expect * instead of NULL
  4. Snapshots: Generated/updated all test snapshots

All tests pass! ✅


Branch |

claude[bot] avatar Nov 17 '25 19:11 claude[bot]

@claude OK — open a PR with that. Use a small inline test, not a full integration test.

Add that guidance to CLAUDE.md: only add full integration tests when developing huge features — in general add small inline insta snapshots tests

max-sixty avatar Nov 17 '25 20:11 max-sixty

Claude finished @max-sixty's task —— View jobclaude/issue-5556-20251117-2023Create PR ➔


PR ready with inline tests! ✅

Task List:

  • [x] Check current branch state
  • [x] Review previous fix implementation
  • [x] Apply the fix to gen_query.rs
  • [x] Add inline insta snapshot test
  • [x] Update CLAUDE.md with test guidance
  • [x] Run fast tests (task prqlc:test)
  • [x] Run full test suite (611 tests passed!)
  • [x] Commit and push changes
  • [x] Provide PR link

Summary

Successfully converted the DISTINCT ON fix to use a small inline insta snapshot test instead of a full integration test, as requested.

Changes made:

  1. Fixed the bug in prqlc/prqlc/src/sql/gen_query.rs:

    • When DISTINCT ON is used with an empty projection, now correctly outputs * instead of NULL
    • This fixes the invalid SQL generation for PostgreSQL, DuckDB, and other databases
  2. Added inline test in gen_query.rs:

    • Small, focused test using insta::assert_snapshot!
    • Tests the exact scenario from issue #5556
  3. Updated existing test in sql.rs:

    • Fixed test_distinct_on_03 snapshot to expect * instead of NULL
  4. Updated CLAUDE.md:

    • Added new "Test Strategy" section
    • Clear guidance to prefer inline tests for bug fixes and small features
    • Integration tests only for large, complex features

Test results:

  • ✅ Fast tests: All passed
  • ✅ Full test suite: 611 tests passed

--- • Branch

claude[bot] avatar Nov 17 '25 20:11 claude[bot]