sqlfluff
sqlfluff copied to clipboard
BigQuery: Accessing `STRUCT` elements evades triggering L027
Search before asking
- [X] I searched the issues and found no similar issues.
What Happened
Accessing unreferenced STRUCT elements using BigQuery dot notation in a multi table query does not trigger L027.
Expected Behaviour
L027 gets triggered.
Observed Behaviour
L027 does not get triggered.
How to reproduce
SELECT
t1.col1,
t2.col2,
events.id
FROM t_table1 AS t1
LEFT JOIN t_table2 AS t2
ON TRUE
Dialect
BigQUery
Version
0.11.2 using online.sqlfluff.com
Configuration
N/A
Are you willing to work on and submit a PR to address the issue?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
This is tricky.
Basicaly L026 works to make sure qualified columns only use tables in the from clause. This doesn’t really work for STRUCTs as impossible to know if it’s a qualified column or a STRUCT, so is off by default for languages that support them - like BigQuery.
L027 works to make sure columns are qualified for multi-table joins (i.e. have at least one dot). But it doesn’t check the qualifiers are valid - that’s L026’s job, which as I say is off by default for BigQuery.
This is tricky.
Basicaly L026 works to make sure qualified columns only use tables in the from clause. This doesn’t really work for
STRUCTs as impossible to know if it’s a qualified column or aSTRUCT, so is off by default for languages that support them - like BigQuery.L027 works to make sure columns are qualified for multi-table joins (i.e. have at least one dot). But it doesn’t check the qualifiers are valid - that’s L026’s job, which as I say is off by default for BigQuery.
But my understanding of why L026 is turned off for languages with structs is because of false positives. I.e. for
SELECT
col1,
a.col2
it is not clear a-priori whether a. is table alias (in which case L026 should trigger because of mixed references) or a is in fact a struct and L026 should not trigger.
However, in L027 we are in a slightly different scenario, as we do not need to detect whether we are in a mixed scenario or not. We already know, that we want every column to have a qualified reference. So, the task (as far as L027 is concerned) becomes checking whether each reference is a) qualified and b) valid (exists in the list of table aliases).
I have been tinkering around with this a little bit, I'll send a PR where we can discuss this further in a moment.
You are correct. It was just that, other than for BigQuery, it was not really necessary to test for b) (valid) since the combination of L026 and L027 would effectively test for that.
It does feel like a little bit of duplication to do this test in both L026 and L027.
It also means any invalid references will flag in both L026 and L027 for the same error.
You are correct. It was just that, other than for BigQuery, it was not really necessary to test for b) (valid) since the combination of L026 and L027 would effectively test for that.
It does feel like a little bit of duplication to do this test in both L026 and L027.
It also means any invalid references will flag in both L026 and L027 for the same error.
I was actually talking about L028 not L026 🤦 but the argument is the same.
I agree that duplicated errors would be very odd for non-BigQuery users. What do you think about making the check for valid references a configurable option to L027 which is disabled by default?
This way BigQuery users could benefit from the additional check while other do not get disturbed. Code-wise it doesn't seem too much of duplication, as we'd be reusing most of L020's work (although I think it only covers SELECT)
So here's some SQL:
SELECT
t3.col1,
col2
FROM
table1 t1
JOIN
table2 t2
ON t1.col1 = t2.col1
This would potentially flag:
- L026 - for
t3.col1(but not for bigquery by default) - L027 - for
t3.col1(but only because there are two or more tables otherwise it wouldn't flag here) - L027 for
col2 - L028 for
t3.col1andcol2, offering to fixcol2
Maybe that's OK though since they often won't flag together? And luckily only L028 is fix compatible so at least L026 and L027 won't both try to fix at same time.
Will give it some more thought...