ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

WIP Enhancement/7992967434/filters and projections ternary operator

Open alexowens90 opened this issue 10 months ago • 1 comments

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...
  • [ ] Have you updated the relevant docstrings, documentation and copyright notice?
  • [ ] Is this contribution tested against all ArcticDB's features?
  • [ ] Do all exceptions introduced raise appropriate error messages?
  • [ ] Are API changes highlighted in the PR description?
  • [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

alexowens90 avatar Jan 07 '25 09:01 alexowens90

Would be good to give more helpful error messages when users misuse the API, eg,

def test_filter_ternary_column_full_and_empty_results(lmdb_version_store_v1):
    lib = lmdb_version_store_v1
    symbol = "test_filter_ternary_column_full_and_empty_results"
    df = pd.DataFrame(
        {
            "conditional": [True, False, False, True, False, True],
            "col1": ["a", "b"] * 3,
            "col2": [0] * 6,
        },
        index=pd.date_range("2024-01-01", periods=6)
    )
    lib.write(symbol, df)

    q = QueryBuilder()
    q = q[where(q["conditional"], q["col1"], q["col1"])]
    received = lib.read(symbol, query_builder=q).data
    # blows up, E       arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Cannot convert column '(COND ? col1 : col1)' of type TD<type=STRING, dim=0> to a bitset

That is actually coming from the filter after the ternary operator, you'd get the same with q = q[q["col1"]]. The only way to make it a bit more explanatory would be to attach expression names to bitsets in VariantData, similarly to how we do with columns that are functions of other columns. That's quite a big refactor though, if we decide it's worth it then I'd do it in a separate PR.

alexowens90 avatar May 21 '25 15:05 alexowens90

Would be good to give more helpful error messages when users misuse the API, eg,

def test_filter_ternary_column_full_and_empty_results(lmdb_version_store_v1):
    lib = lmdb_version_store_v1
    symbol = "test_filter_ternary_column_full_and_empty_results"
    df = pd.DataFrame(
        {
            "conditional": [True, False, False, True, False, True],
            "col1": ["a", "b"] * 3,
            "col2": [0] * 6,
        },
        index=pd.date_range("2024-01-01", periods=6)
    )
    lib.write(symbol, df)

    q = QueryBuilder()
    q = q[where(q["conditional"], q["col1"], q["col1"])]
    received = lib.read(symbol, query_builder=q).data
    # blows up, E       arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Cannot convert column '(COND ? col1 : col1)' of type TD<type=STRING, dim=0> to a bitset

That is actually coming from the filter after the ternary operator, you'd get the same with q = q[q["col1"]]. The only way to make it a bit more explanatory would be to attach expression names to bitsets in VariantData, similarly to how we do with columns that are functions of other columns. That's quite a big refactor though, if we decide it's worth it then I'd do it in a separate PR.

Makes sense thanks, this was my mistake while playing around with that test

poodlewars avatar May 21 '25 15:05 poodlewars