posthog
posthog copied to clipboard
feat(hogql): add list expr
Problem
SQL syntax is more expressive than HogQL AST nodes for complex structures. Unfortunately parse_select and parse_expr (1) don't support list placeholders, (2) don't support optional siblings and (3) don't support nullable placeholders.
Examples:
parse_select('SELECT {columns} FROM events', placeholders={"columns": [ast.Constant(value=1), ast.Constant(value=2)]})(doesn't support list-type columns)parse_expr("max(steps) over (PARTITION BY aggregation_target {placeholder}) as max_steps", placeholders={"placeholder": ast.Field(chain=['some_col'])})(doesn't support the placeholder sibling)parse_select('SELECT event FROM events WHERE {where}', placeholders={"where": None})(doesn't support nullable placeholder)
Changes
This PR introduces a ast.ListExpr node that can only be added manually to the parse tree and gets expanded to a list. It would be great to add sibling support to the parser for this as well.
Alternatives
- Construct the AST manually (without using
parse_selectorparse_expr). - Use f-strings that are then parsed. This would mean that we'd need to handle things like column names as strings instead of
ast.Fields as long as possible. - Parse the SQL and then modify the AST.
QueryAlternator.append_select
How did you test this code?
Going to add test cases after feedback.
The third case is usually solved with a placeholders={"where": ast.Constant(value=True)}.
However, yes, placeholders aren't supported everywhere. So far when there has been a new place we need them, we've added them to the grammar directly, with corresponding parser nodes in both the Python and C versions.
The third case is usually solved with a
placeholders={"where": ast.Constant(value=True)}.
Do we actually need the error message for None values? Using the commit I just pushed, things like the example would just work without a workaround.
However, yes, placeholders aren't supported everywhere. So far when there has been a new place we need them, we've added them to the grammar directly, with corresponding parser nodes in both the Python and C versions.
Does the ListExpr thing make sense? Or would I rather adapt the SelectQuery parser to handle both single- and list type placeholders?
parse_select('SELECT {columns} FROM events', placeholders={"columns": [ast.Constant(value=1), ast.Constant(value=2)]}) (doesn't support list-type columns)
Out of interest, does using ast.Array(exprs=[...]) work here as a placeholder instead of a raw python array?
Does the ListExpr thing make sense? Or would I rather adapt the SelectQuery parser to handle both single- and list type placeholders?
I reckon adapting select query to handle array placeholders (either via raw python arrays, or via the ast.Array node) would be preferable than adding a ListExpr
parse_select('SELECT {columns} FROM events', placeholders={"columns": [ast.Constant(value=1), ast.Constant(value=2)]}) (doesn't support list-type columns)
Out of interest, does using
ast.Array(exprs=[...])work here as a placeholder instead of a raw python array?
ast.Arrays are actual SQL arrays as in SELECT [1,2].
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.
This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.