ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(api): add FieldsNotFoundError

Open NickCrews opened this issue 1 year ago • 13 comments

I have been getting sick of typing some_table_or_struct.field_that_doesnt_exist_or_has_a_small_typo and then getting a useless error message. It also is annoying to do some_table.select(doesnt_exist, also_doesnt_exist), and you only get an error for the very first one. It would be better if you got back info on ALL the values that failed to get resolved.

This PR makes that UX much better.

Still need to add tests, but I wanted to get this up here for some initial thoughts before I invested more time. Is this something we want to pursue?

NickCrews avatar Nov 01 '24 19:11 NickCrews

@cpcloud does this look like an idea worth considering? Any behavior requirements that you think I would need to make an implementation satisfy?

NickCrews avatar Dec 05 '24 00:12 NickCrews

@cpcloud should I invest more time in this or do you think this whole idea is not the right direction?

NickCrews avatar Jan 24 '25 06:01 NickCrews

I'm not opposed to this but I think the behavior should be different for attribute versus getitem syntax.

It all comes down to what we believe the user's intent is, to the best of our knowledge.

For attribute misspellings, I think we should leave them as plain old AttributeError and do nothing fancy. We have no idea whether the user meant a method or a column, and we should really refuse the temptation to guess.

For [] syntax, the only thing that could possibly be is a field access (in both the column and the struct field case), so upgrading the UX with FieldNotFound seems like a nice improvement.

cpcloud avatar Jan 24 '25 13:01 cpcloud

That sounds like good behavior to me to return an AttributeError for .dot accesses. I think that makes sense to "refuse the temptation to guess" and raise an AttributeError and not a FieldNotFoundError, because that is better to be caught programmatically. But, what about being a little bit fancy/helpful, and returning a TableAttributeError, which is a subclass of AttributeError, which has the helpful "did you mean this column?" error message? Then this is the best of both worlds?

There is one bit of nuance on attribute access though. If I do t.bogus, I agree that should just be the above behavior, it is impossible to guess if the user meant a method or a column, so just return an AttributeError (or subclass of it as I suggest above). However, if I do t.select(_.bogus), then this is unambiguous that they wanted a Column, and so returning a FieldNotFoundError seems better.

In pseudocode:

# Raised on t.bogus
# Includes helpful suggestion "did you mean <best match among all columns AND attributes/methods>"
class TableAttributeError(AttributeError, IbisError): ...

# Raised on my_struct.bogus
# Includes helpful suggestion "did you mean <best match among all fields AND attributes/methods>"
# Maybe should get consolidated with TableAttributeError
class StructAttributeError(AttributeError, IbisError): ...

# Raised on table_or_struct["bogus", "bogus2"] and table_or_struct.select(_.bogus, _.bogus2) and my_struct["bogus"]
# Includes helpful suggestion "did you mean <best match among ONLY columns/fields>"
# Note that this contains ALL not found fields.
# I think that would be simpler to just have one error, and not one for single FieldNotFound.
# Should this inherit from AttributeError and/or KeyError?
# I think AttributeError since 1. these are statically known and 2. then a user can `catch AttributeError`
# and have all of these errors caught.
class FieldsNotFoundError(AttributeError, IbisError): ...

What do you think of this spec?

NickCrews avatar Jan 25 '25 00:01 NickCrews

and do nothing fancy we should really refuse the temptation to guess

So I understand your goals, is your reasoning here so that 1. our maintenance is easier or 2. so we don't accidentally do the wrong thing for the user?

NickCrews avatar Jan 25 '25 00:01 NickCrews

I think that spec is too complex for now. Let's start with plain AttributeError and only beef up the getitem errors.

cpcloud avatar Feb 12 '25 12:02 cpcloud

Sounds good, thanks, I'll try to update this implementation and let's see what we think about it!

NickCrews avatar Feb 16 '25 07:02 NickCrews

Not sure what is causing the postgres failure but I think it's unrelated.

Details
________________ test_insert_with_database_specified[postgres] _________________
[gw3] linux -- Python 3.10.16 /home/runner/work/ibis/ibis/.venv/bin/python

con_create_database = <ibis.backends.postgres.Backend object at 0x7f1e81750f70>

    @pytest.mark.notyet(
        ["flink"],
        reason="unclear whether Flink supports cross catalog/database inserts",
        raises=Py4JJavaError,
    )
    def test_insert_with_database_specified(con_create_database):
        con = con_create_database
    
        t = ibis.memtable({"a": [1, 2, 3]})
    
        with create_and_destroy_db(con) as dbname:
            con.create_table(
                table_name := gen_name("table"),
                obj=t,
                database=dbname,
                temp=con.name == "flink",
            )
            try:
>               con.insert(table_name, obj=t, database=dbname)

ibis/backends/tests/test_client.py:1412: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/backends/sql/__init__.py:464: in insert
    query = self._build_insert_from_table(
ibis/backends/sql/__init__.py:479: in _build_insert_from_table
    target_cols = self.get_schema(target, catalog=catalog, database=db).keys()
ibis/backends/postgres/__init__.py:521: in get_schema
    with self._safe_raw_sql(type_info) as cur:
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/contextlib.py:135: in __enter__
    return next(self.gen)
ibis/backends/postgres/__init__.py:710: in _safe_raw_sql
    with contextlib.closing(self.raw_sql(*args, **kwargs)) as result:
ibis/backends/postgres/__init__.py:719: in raw_sql
    query = query.sql(dialect=self.dialect)
.venv/lib/python3.10/site-packages/sqlglot/expressions.py:612: in sql
    from sqlglot.dialects import Dialect
<frozen importlib._bootstrap>:1075: in _handle_fromlist
    ???
.venv/lib/python3.10/site-packages/sqlglot/dialects/__init__.py:111: in __getattr__
    module = importlib.import_module(f"sqlglot.dialects.{module_name}")
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1050: in _gcd_import
    ???
<frozen importlib._bootstrap>:1024: in _find_and_load
    ???
<frozen importlib._bootstrap>:171: in __enter__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = _ModuleLock('sqlglot.dialects.dialect') at 13[97](https://github.com/ibis-project/ibis/actions/runs/13428575878/job/37520654230?pr=10412#step:12:98)68970723824

>   ???
E   KeyError: 139769782639488

<frozen importlib._bootstrap>:123: KeyError

NickCrews avatar Feb 20 '25 16:02 NickCrews

As you can see in the updated tests, this change is potentially breaking for users who are trying to catch IbisTypeError. But, it does unify these two errors so they only need to catch a FieldsNotFoundError, so that is a plus. I think this breakage is OK. We could make FieldsNotFoundError subclass from BOTH AttributeError and IbisTypeError? IDK, I could see how table.bogus is a sort of TypeError.

NickCrews avatar Feb 20 '25 19:02 NickCrews

@cpcloud this is ready for another review whenever you get the chance. Thank you!

NickCrews avatar Mar 06 '25 16:03 NickCrews

@cpcloud is there anything I can do to reduce the effort to review this? want to chat about this change live in eg zulip?

NickCrews avatar Mar 19 '25 17:03 NickCrews

@cpcloud any way we can get this in as a breaking change for 11.0?

NickCrews avatar Apr 23 '25 04:04 NickCrews

Yes, we can definitely get this in for 11.

cpcloud avatar Apr 23 '25 11:04 cpcloud