ibis
ibis copied to clipboard
feat: Better error when creating table with conflicting names due to capitalization
Is your feature request related to a problem?
Ibis treats columns named "a" and "A" differently, and you can have both in a relation. I think this makes, some backends support this.
However, there are some sharp edges when you try to use this with a backend that DOESN'T distinguish between "A" and "a", such as duckdb:
import ibis
t = ibis.memtable({"a": [1, 2, 3]})
t = t.mutate(A=t.a)
t.cache()
Details
---------------------------------------------------------------------------
CatalogException Traceback (most recent call last)
Cell In[2], line 7
5 t = ibis.memtable({"a": [1, 2, 3]})
6 t = t.mutate(A=t.a)
----> 7 t.cache()
File ~/code/ibis/ibis/expr/types/relations.py:3531, in Table.cache(self)
3458 """Cache the provided expression.
3459
3460 All subsequent operations on the returned expression will be performed
(...)
3528 └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘
3529 """
3530 current_backend = self._find_backend(use_default=True)
-> 3531 return current_backend._cached_table(self)
File ~/code/ibis/ibis/backends/__init__.py:772, in CacheHandler._cached_table(self, table)
770 entry = self._cache_op_to_entry.get(table.op())
771 if entry is None or (cached_op := entry.cached_op_ref()) is None:
--> 772 cached_op = self._create_cached_table(util.gen_name("cached"), table).op()
773 entry = CacheEntry(
774 table.op(),
775 weakref.ref(cached_op),
(...)
778 ),
779 )
780 self._cache_op_to_entry[table.op()] = entry
File ~/code/ibis/ibis/backends/__init__.py:805, in CacheHandler._create_cached_table(self, name, expr)
804 def _create_cached_table(self, name: str, expr: ir.Table) -> ir.Table:
--> 805 return self.create_table(name, expr, schema=expr.schema(), temp=True)
File ~/code/ibis/ibis/backends/duckdb/__init__.py:197, in Backend.create_table(self, name, obj, schema, database, temp, overwrite)
193 # This is the same table as initial_table unless overwrite == True
194 final_table = sg.table(
195 name, catalog=catalog, db=database, quoted=self.compiler.quoted
196 )
--> 197 with self._safe_raw_sql(create_stmt) as cur:
198 if query is not None:
199 insert_stmt = sge.insert(query, into=initial_table).sql(self.name)
File ~/.pyenv/versions/3.11.6/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
135 del self.args, self.kwds, self.func
136 try:
--> 137 return next(self.gen)
138 except StopIteration:
139 raise RuntimeError("generator didn't yield") from None
File ~/code/ibis/ibis/backends/duckdb/__init__.py:319, in Backend._safe_raw_sql(self, *args, **kwargs)
317 @contextlib.contextmanager
318 def _safe_raw_sql(self, *args, **kwargs):
--> 319 yield self.raw_sql(*args, **kwargs)
File ~/code/ibis/ibis/backends/duckdb/__init__.py:97, in Backend.raw_sql(self, query, **kwargs)
95 with contextlib.suppress(AttributeError):
96 query = query.sql(dialect=self.name)
---> 97 return self.con.execute(query, **kwargs)
CatalogException: Catalog Error: Column with name A already exists!
related issues, though I don't think exactly the same:
- https://github.com/ibis-project/ibis/issues/9853
- https://github.com/ibis-project/ibis/issues/6772
IDK, this may also just be a bug with the duckdb backend, not a more general feature request.
What is the motivation behind your request?
I had a bug in my code where I accidentally created both columns "A" and "a". Then, I got this cryptic error later.
Describe the solution you'd like
Not sure exactly what to do here, but a few ideas:
- If a table is bound to a backend that doesn't distinguish between upper/lower (eg duckdb), could we fail earlier? Probably not a good idea, I would think we want all expressions to be portable between backends.
- catch this error whenever we actually interact with the backend, and give a nicer error there. I think this would be the solution I prefer.
What version of ibis are you running?
main
What backend(s) are you using, if any?
duckdb, not sure the story on other backends.
Code of Conduct
- [X] I agree to follow this project's Code of Conduct