LocustDB Queries that have type errors or reference missing columns should give helpful errors/warnings

STR:

load https://flatteningthecurve.herokuapp.com/data/canadatesting
execute select cumulative_testing from default where province = "Ontario"
you get: Type error: Function Equals is not implemented for types Type { decoded: String, codec: Some(Codec { ops: [PushDataSection(1), PushDataSection(2), DictLookup(U8)], column_name: "province", section_types: [U8, U64, U8], decoded_type: String, is_summation_preserving: false, is_order_preserving: true, is_fixed_width: true }), is_scalar: false, is_borrowed: false }, Type { decoded: Null, codec: None, is_scalar: false, is_borrowed: false }

May 22 '20 22:05 jrmuizel

Thanks for reporting this! Queries like this should work so there's probably some bug that won't be too hard to fix. I'll take a closer look on the weekend.

May 23 '20 00:05 cswinter

Actually, the problem looks like it might be caused by province being a missing/null column. What do you get when you run the :memtree command? What command did you use to load the data?

So I think there's two issues here:

comparison between null and String is not defined, this should probably just evaluate to false
querying columns that don't exist should display a warning

May 23 '20 00:05 cswinter

$ ./repl --load /tmp/canada_testing.csv
Loading /tmp/canada_testing.csv into table default.
Loaded data in 1.76ms.

# Table `default` (884 rows, 9.39KiB) #
id: 1.7KiB
province: 1.1KiB
date: 3.1KiB
cumulative_testing: 3.5KiB

# Table `_meta_tables` (0 rows, 153B) #
locustdb> :memtree
default  9.39KiB  884 rows  10.9B/row
├─ cumulative_testing       fully resident    3.5KiB   37%    4.0B/row
├─ date                     fully resident    3.1KiB   33%    3.6B/row
├─ id                       fully resident    1.7KiB   18%    2.0B/row
└─ province                 fully resident    1.1KiB   12%    1.3B/row

_meta_tables  0.000B  0 rows  NaNB/row

May 23 '20 02:05 jrmuizel

Oh I think I figured out the actual problem, the parser interprets "Ontario" as an identifier for the (non-existent) Ontario column. This query should work: select cumulative_testing from default where province = 'Ontario'.

~The LocustDB version on latest master actually works as you would expect for the original query~ and has some other fixes as well, I ought to publish a new version. I'll probably want to automate the whole release process with a github action which will take me a little while.

And of course the error message should still be improved.

May 23 '20 03:05 cswinter

Correction: single quotes are in fact the correct way to delimit strings in SQL and the only reason it was working for me was because I was testing with an old version of LocustDB with less sophisticated parser.

May 23 '20 03:05 cswinter

Is it possible to give a better error message in this case? https://github.com/MaterializeInc/materialize gives: ERROR: column "Ontario" does not exist in this case.

May 24 '20 14:05 jrmuizel

Is it possible to give a better error message in this case? https://github.com/MaterializeInc/materialize gives: ERROR: column "Ontario" does not exist in this case.

Basically yes, though it's a bit more tricky in LocustDB because its semantics actually allow for querying non-existent columns which are then assumed to be all nulls. This is because it has no fixed schema and any column might exist, even if there are no values within any particular shard.

List of action items from this issue:

[x] Automate release process and release new LocustDB version
[ ] Improve error messages for type errors by pointing to the offending subexpression and include the simple base types in addition to the full type expression that include codecs
[ ] Detect when queries contain a column that is null/non-existent across all shards and print a warning
[ ] Detect when double quotes are used in a way that implies a string was intended and print helpful warning messge
[ ] Make String<->Null equality operator work

May 24 '20 15:05 cswinter

LocustDB LocustDB copied to clipboard

Queries that have type errors or reference missing columns should give helpful errors/warnings

LocustDB
LocustDB copied to clipboard