posthog
                                
                                 posthog copied to clipboard
                                
                                    posthog copied to clipboard
                            
                            
                            
                        feat(hogql): support CTEs in UNION ALL
Problem
https://github.com/PostHog/posthog/issues/20663
JFYI, ClickHouse doesn't allow one to use CTEs from the first statement if a statement has its own CTE.
This doesn't work in CH
WITH cte1 AS (SELECT 1 AS a)
SELECT 1 AS a
UNION ALL
WITH cte2 AS (SELECT 1 AS a)
SELECT * FROM cte1
Changes
It adds CTEs from the initial statement to all statements in UNION ALL
Does this work well for both Cloud and self-hosted?
Yes
How did you test this code?
Automated tests
🔍 Existing Issues For Review
Your pull request is modifying functions with the following pre-existing issues:
📄 File: posthog/hogql/resolver.py
| Function | Unhandled Issue | 
|---|---|
| visit | TypeError: sequence item 0: expected str instance, list found posthog.tasks.tasks.proce... Event Count:7 | 
Did you find this useful? React with a 👍 or 👎
Hi @thmsobrmlr! PTAL
Thanks for the review @mariusandra
Hey thanks for this! I added one comment inline with regards to some optimizations. However the bigger question is: do we even need to do this cloning? Would it be better to update the lookup functions visit_field in resolver to also "go higher" and loop through all SelectQueryType-s in self.scopes, looking for CTEs that could match?
 
That won't work because selects from union all are processed sequentially and not recursively. So every select is processed only with its own scope, because the scope is popped after processing. I confirmed that by debugging as well.
That won't work because
selects fromunion allare processed sequentially and not recursively. So every select is processed only with its own scope, because the scope is popped after processing. I confirmed that by debugging as well.
We could add the SelectUnionQueryType also onto the scope, and visit the other queries that way. I checked briefly, and it seemed most (all?) things that access the last scope will only do it in a context where the last thing will be a SelectQueryType anyway, so this probably won't require a lot of rewiring.
We could add the SelectUnionQueryType also onto the scope, and visit the other queries that way. I checked briefly, and it seemed most (all?) things that access the last scope will only do it in a context where the last thing will be a SelectQueryType anyway, so this probably won't require a lot of rewiring.
IIUC your main concern is that I'm rewriting the query while resolving types, which is not the best way to propagate context to the downstream expressions. I agree with that.
I'm not sure adding SelectUnionQueryType to the scopes is the right approach though, IMO it complicates the scopes semantics, mostly because now we have two types in the scopes. What I could do is to add an ephemeral SelectQueryType onto the scopes. This type would only have the CTEs of the first Select, it should work out of the box (see this commit what I mean, the tests do pass). There's one downside though, the following query will work in HogQL, but it will not work in ClickHouse
WITH first_cte AS (SELECT 1 AS a)
SELECT * FROM first_cte
UNION ALL
WITH overriding_cte AS (SELECT 1 AS a)
SELECT * FROM first_cte -- note - one can't use first_cte here, because overriding_cte overrides the first_cte
That will happen because resolver searches for CTEs in all scopes. I think it's not that big of an issue, WDYT?
Your approach makes sense. We're actually doing something similar with lambdas as well.
It actually hints at a refactor we haven't taken on yet: I'd argue SelectQueryType has taken on too many responsibilities, and should split part of it to a new Scope class. Then self.scopes would actually be a list[Scope].
In that future world, you'd just create a new scope for the SelectUnionQueryType, and add it to the list with the rest of them. However for now, we're forced to use SelectQueryType, as a sort of scope. A refactor for this is definitely out of scope (bad pun), so what you're proposing makes total sense.
bad pun
Nah, made me giggle:)
I'd argue SelectQueryType has taken on too many responsibilities, and should split part of it to a new Scope class
Yeah, I agree, this class looks much more like a Scope rather than SelectQueryType atm.
Anyway, I updated the PR, please take a look!