EXCEPT doesn't work at all with JOIN
Hello there,
it's me again. I found another bug. Before describing it, I just quickly wanna thank you for RBQL and Rainbow CSV. About 2 months ago I just searched for a VS Code extension to improve readably of .csv files. Would have never imagined anything with such amazing functionality; really improved my interactions with .csv files by multiple orders of magnitude.
On to the bug: I've create 2 test .csv files file1.csv
id1|data1
1|qwe
2|asd
3|zxc
4|qwe
5|asd
id2|data2
1|rty
2|fgh
3|vbn
4|rty
5|fgh
These two queries result in errors, so at least no incorrect results.
SELECT * EXCEPT b.id2 JOIN file2.csv ON a.id1==b.id2
error: 'Unknown field in EXCEPT expression: "b.id2"'
SELECT * EXCEPT b.data2 JOIN file2.csv ON a.id1==b.id2
error: 'Unknown field in EXCEPT expression: "b.data2"'
What is more problematic tho is that the following 2 queries return incorrect results. Ofc isolated like this, it's pretty obvious, that the results are off; but in use it could potentially go by unnoticed in a bigger project.
SELECT * EXCEPT a.id1 JOIN file2.csv ON a.id1==b.id2
result (2 columns are missing):
data1 qwe asd zxc qwe asd
SELECT * EXCEPT a.data1 JOIN file2.csv ON a.id1==b.id2
result (2 columns are missing):
id1 1 2 3 4 5
Best regards Tobias
Hello again, and thank you so much for the feedback, and for the bug report!
EXCEPT is a niche feature, the only major SQL engine which supports it (that I am aware of) is Big Query: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except.
At some point I even removed EXCEPT, but then changed my mind and decided to keep it, so looks like I made a good decision if you (and maybe some other people too) find it useful. Saying that I think the best course of action, for now, is to explicitly disallow EXCEPT with JOIN statements until I find time to properly implement it.
I totally agree, if you disallow them being used together, the simple workaround is to run 2 consecutive queries, the 1st with SELECT * [...] JOIN [...]
and the 2nd one with
SELECT * EXCEPT [...]
That's just mildly inconvenient, but no functionality is lost that way.
The important part is, as you also pointed out, to not produce incorrect results.
I fixed this in the master branch and will try to deploy the fix soon.