ArcticDB
ArcticDB copied to clipboard
WIP Enhancement/7992967434/filters and projections ternary operator
Reference Issues/PRs
What does this implement or fix?
Any other comments?
Checklist
Checklist for code changes...
- [ ] Have you updated the relevant docstrings, documentation and copyright notice?
- [ ] Is this contribution tested against all ArcticDB's features?
- [ ] Do all exceptions introduced raise appropriate error messages?
- [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?
Would be good to give more helpful error messages when users misuse the API, eg,
def test_filter_ternary_column_full_and_empty_results(lmdb_version_store_v1): lib = lmdb_version_store_v1 symbol = "test_filter_ternary_column_full_and_empty_results" df = pd.DataFrame( { "conditional": [True, False, False, True, False, True], "col1": ["a", "b"] * 3, "col2": [0] * 6, }, index=pd.date_range("2024-01-01", periods=6) ) lib.write(symbol, df) q = QueryBuilder() q = q[where(q["conditional"], q["col1"], q["col1"])] received = lib.read(symbol, query_builder=q).data # blows up, E arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Cannot convert column '(COND ? col1 : col1)' of type TD<type=STRING, dim=0> to a bitset
That is actually coming from the filter after the ternary operator, you'd get the same with q = q[q["col1"]]. The only way to make it a bit more explanatory would be to attach expression names to bitsets in VariantData, similarly to how we do with columns that are functions of other columns. That's quite a big refactor though, if we decide it's worth it then I'd do it in a separate PR.
Would be good to give more helpful error messages when users misuse the API, eg,
def test_filter_ternary_column_full_and_empty_results(lmdb_version_store_v1): lib = lmdb_version_store_v1 symbol = "test_filter_ternary_column_full_and_empty_results" df = pd.DataFrame( { "conditional": [True, False, False, True, False, True], "col1": ["a", "b"] * 3, "col2": [0] * 6, }, index=pd.date_range("2024-01-01", periods=6) ) lib.write(symbol, df) q = QueryBuilder() q = q[where(q["conditional"], q["col1"], q["col1"])] received = lib.read(symbol, query_builder=q).data # blows up, E arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Cannot convert column '(COND ? col1 : col1)' of type TD<type=STRING, dim=0> to a bitsetThat is actually coming from the filter after the ternary operator, you'd get the same with
q = q[q["col1"]]. The only way to make it a bit more explanatory would be to attach expression names to bitsets inVariantData, similarly to how we do with columns that are functions of other columns. That's quite a big refactor though, if we decide it's worth it then I'd do it in a separate PR.
Makes sense thanks, this was my mistake while playing around with that test