feat(api): add FieldsNotFoundError
I have been getting sick of typing some_table_or_struct.field_that_doesnt_exist_or_has_a_small_typo and then getting a useless error message. It also is annoying to do some_table.select(doesnt_exist, also_doesnt_exist), and you only get an error for the very first one. It would be better if you got back info on ALL the values that failed to get resolved.
This PR makes that UX much better.
Still need to add tests, but I wanted to get this up here for some initial thoughts before I invested more time. Is this something we want to pursue?
@cpcloud does this look like an idea worth considering? Any behavior requirements that you think I would need to make an implementation satisfy?
@cpcloud should I invest more time in this or do you think this whole idea is not the right direction?
I'm not opposed to this but I think the behavior should be different for attribute versus getitem syntax.
It all comes down to what we believe the user's intent is, to the best of our knowledge.
For attribute misspellings, I think we should leave them as plain old AttributeError and do nothing fancy. We have no idea whether the user meant a method or a column, and we should really refuse the temptation to guess.
For [] syntax, the only thing that could possibly be is a field access (in both the column and the struct field case), so upgrading the UX with FieldNotFound seems like a nice improvement.
That sounds like good behavior to me to return an AttributeError for .dot accesses. I think that makes sense to "refuse the temptation to guess" and raise an AttributeError and not a FieldNotFoundError, because that is better to be caught programmatically. But, what about being a little bit fancy/helpful, and returning a TableAttributeError, which is a subclass of AttributeError, which has the helpful "did you mean this column?" error message? Then this is the best of both worlds?
There is one bit of nuance on attribute access though. If I do t.bogus, I agree that should just be the above behavior, it is impossible to guess if the user meant a method or a column, so just return an AttributeError (or subclass of it as I suggest above). However, if I do t.select(_.bogus), then this is unambiguous that they wanted a Column, and so returning a FieldNotFoundError seems better.
In pseudocode:
# Raised on t.bogus
# Includes helpful suggestion "did you mean <best match among all columns AND attributes/methods>"
class TableAttributeError(AttributeError, IbisError): ...
# Raised on my_struct.bogus
# Includes helpful suggestion "did you mean <best match among all fields AND attributes/methods>"
# Maybe should get consolidated with TableAttributeError
class StructAttributeError(AttributeError, IbisError): ...
# Raised on table_or_struct["bogus", "bogus2"] and table_or_struct.select(_.bogus, _.bogus2) and my_struct["bogus"]
# Includes helpful suggestion "did you mean <best match among ONLY columns/fields>"
# Note that this contains ALL not found fields.
# I think that would be simpler to just have one error, and not one for single FieldNotFound.
# Should this inherit from AttributeError and/or KeyError?
# I think AttributeError since 1. these are statically known and 2. then a user can `catch AttributeError`
# and have all of these errors caught.
class FieldsNotFoundError(AttributeError, IbisError): ...
What do you think of this spec?
and do nothing fancy we should really refuse the temptation to guess
So I understand your goals, is your reasoning here so that 1. our maintenance is easier or 2. so we don't accidentally do the wrong thing for the user?
I think that spec is too complex for now. Let's start with plain AttributeError and only beef up the getitem errors.
Sounds good, thanks, I'll try to update this implementation and let's see what we think about it!
Not sure what is causing the postgres failure but I think it's unrelated.
Details
________________ test_insert_with_database_specified[postgres] _________________
[gw3] linux -- Python 3.10.16 /home/runner/work/ibis/ibis/.venv/bin/python
con_create_database = <ibis.backends.postgres.Backend object at 0x7f1e81750f70>
@pytest.mark.notyet(
["flink"],
reason="unclear whether Flink supports cross catalog/database inserts",
raises=Py4JJavaError,
)
def test_insert_with_database_specified(con_create_database):
con = con_create_database
t = ibis.memtable({"a": [1, 2, 3]})
with create_and_destroy_db(con) as dbname:
con.create_table(
table_name := gen_name("table"),
obj=t,
database=dbname,
temp=con.name == "flink",
)
try:
> con.insert(table_name, obj=t, database=dbname)
ibis/backends/tests/test_client.py:1412:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ibis/backends/sql/__init__.py:464: in insert
query = self._build_insert_from_table(
ibis/backends/sql/__init__.py:479: in _build_insert_from_table
target_cols = self.get_schema(target, catalog=catalog, database=db).keys()
ibis/backends/postgres/__init__.py:521: in get_schema
with self._safe_raw_sql(type_info) as cur:
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/contextlib.py:135: in __enter__
return next(self.gen)
ibis/backends/postgres/__init__.py:710: in _safe_raw_sql
with contextlib.closing(self.raw_sql(*args, **kwargs)) as result:
ibis/backends/postgres/__init__.py:719: in raw_sql
query = query.sql(dialect=self.dialect)
.venv/lib/python3.10/site-packages/sqlglot/expressions.py:612: in sql
from sqlglot.dialects import Dialect
<frozen importlib._bootstrap>:1075: in _handle_fromlist
???
.venv/lib/python3.10/site-packages/sqlglot/dialects/__init__.py:111: in __getattr__
module = importlib.import_module(f"sqlglot.dialects.{module_name}")
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1050: in _gcd_import
???
<frozen importlib._bootstrap>:1024: in _find_and_load
???
<frozen importlib._bootstrap>:171: in __enter__
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = _ModuleLock('sqlglot.dialects.dialect') at 13[97](https://github.com/ibis-project/ibis/actions/runs/13428575878/job/37520654230?pr=10412#step:12:98)68970723824
> ???
E KeyError: 139769782639488
<frozen importlib._bootstrap>:123: KeyError
As you can see in the updated tests, this change is potentially breaking for users who are trying to catch IbisTypeError. But, it does unify these two errors so they only need to catch a FieldsNotFoundError, so that is a plus. I think this breakage is OK. We could make FieldsNotFoundError subclass from BOTH AttributeError and IbisTypeError? IDK, I could see how table.bogus is a sort of TypeError.
@cpcloud this is ready for another review whenever you get the chance. Thank you!
@cpcloud is there anything I can do to reduce the effort to review this? want to chat about this change live in eg zulip?
@cpcloud any way we can get this in as a breaking change for 11.0?
Yes, we can definitely get this in for 11.