LocustDB
LocustDB copied to clipboard
Queries that have type errors or reference missing columns should give helpful errors/warnings
STR:
- load https://flatteningthecurve.herokuapp.com/data/canadatesting
- execute
select cumulative_testing from default where province = "Ontario"
- you get:
Type error: Function Equals is not implemented for types Type { decoded: String, codec: Some(Codec { ops: [PushDataSection(1), PushDataSection(2), DictLookup(U8)], column_name: "province", section_types: [U8, U64, U8], decoded_type: String, is_summation_preserving: false, is_order_preserving: true, is_fixed_width: true }), is_scalar: false, is_borrowed: false }, Type { decoded: Null, codec: None, is_scalar: false, is_borrowed: false }
Thanks for reporting this! Queries like this should work so there's probably some bug that won't be too hard to fix. I'll take a closer look on the weekend.
Actually, the problem looks like it might be caused by province
being a missing/null column. What do you get when you run the :memtree
command? What command did you use to load the data?
So I think there's two issues here:
- comparison between
null
andString
is not defined, this should probably just evaluate to false - querying columns that don't exist should display a warning
$ ./repl --load /tmp/canada_testing.csv
Loading /tmp/canada_testing.csv into table default.
Loaded data in 1.76ms.
# Table `default` (884 rows, 9.39KiB) #
id: 1.7KiB
province: 1.1KiB
date: 3.1KiB
cumulative_testing: 3.5KiB
# Table `_meta_tables` (0 rows, 153B) #
locustdb> :memtree
default 9.39KiB 884 rows 10.9B/row
├─ cumulative_testing fully resident 3.5KiB 37% 4.0B/row
├─ date fully resident 3.1KiB 33% 3.6B/row
├─ id fully resident 1.7KiB 18% 2.0B/row
└─ province fully resident 1.1KiB 12% 1.3B/row
_meta_tables 0.000B 0 rows NaNB/row
Oh I think I figured out the actual problem, the parser interprets "Ontario" as an identifier for the (non-existent) Ontario
column. This query should work: select cumulative_testing from default where province = 'Ontario'
.
~The LocustDB version on latest master actually works as you would expect for the original query~ and has some other fixes as well, I ought to publish a new version. I'll probably want to automate the whole release process with a github action which will take me a little while.
And of course the error message should still be improved.
Correction: single quotes are in fact the correct way to delimit strings in SQL and the only reason it was working for me was because I was testing with an old version of LocustDB with less sophisticated parser.
Is it possible to give a better error message in this case? https://github.com/MaterializeInc/materialize gives: ERROR: column "Ontario" does not exist
in this case.
Is it possible to give a better error message in this case? https://github.com/MaterializeInc/materialize gives:
ERROR: column "Ontario" does not exist
in this case.
Basically yes, though it's a bit more tricky in LocustDB because its semantics actually allow for querying non-existent columns which are then assumed to be all nulls. This is because it has no fixed schema and any column might exist, even if there are no values within any particular shard.
List of action items from this issue:
- [x] Automate release process and release new LocustDB version
- [ ] Improve error messages for type errors by pointing to the offending subexpression and include the simple base types in addition to the full type expression that include codecs
- [ ] Detect when queries contain a column that is null/non-existent across all shards and print a warning
- [ ] Detect when double quotes are used in a way that implies a string was intended and print helpful warning messge
- [ ] Make String<->Null equality operator work