pypika
pypika copied to clipboard
Warn on reusing common subqueries across multiple QueryBuilders
For beginners and library writers who work atop pypika, there is a common pattern of reusing Pypika objects for queries:
subquery = Query.from_(table).select(table.col1.as_("col"), table.col2)
query1 = Query.from_(subquery).select(Avg(subquery.col)
...
query2 = Query.from_(subquery).select(Sum(subquery.col)
The nested subqueries currently fail this task because the alias for query
is changed in-place when it's Query.from_()
'd. In general, these unintended, subtle side-effects should at least have a warning for the users. In fact, even more subtle scenarios emerge when trying to join multiple tables which have been assigned an alias on accident and resulted in ambiguous names:
> SELECT "sq0"."a","sq0"."c"
-> FROM
-> (SELECT "a","b"
-> FROM
-> (SELECT *
-> FROM (
-> VALUES (1,3), (2,4), (3,5)) AS "yy14" ("a","b"))) "sq0"
-> JOIN
-> (SELECT "c","b"
-> FROM
-> (SELECT *
-> FROM (
-> VALUES (5,3), (6,5), (7,6)) AS "7a4d" ("c","b"))) "sq0" ON "sq0"."b"="sq0"."b";
...
Column 'sq0.b' is ambiguous
As a result, I propose we add a warning to any user who is encountering this subtle caveat from pypika:
ReusedNestedQueryWarning: Query SELECT "name","customer" FROM "a" is used as a nested subquery elsewhere. Consider deepcopying the query to avoid unexpected aliasing.
warn(f"Query {selectable} is used as a nested subquery elsewhere. "
I found that solving this problem but providing backwards compatibility is impractical, since we'd need to deepcopy every subquery that comes in to prevent mutating state, but also understand that Term
s deriving from subqueries belong to the deepcopied version of its table parent. The surface area for Term
s is large enough that I think having a warning would be a good first step if any for this issue.