ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: Better error when creating table with conflicting names due to capitalization

Open NickCrews opened this issue 1 year ago • 2 comments

Is your feature request related to a problem?

Ibis treats columns named "a" and "A" differently, and you can have both in a relation. I think this makes, some backends support this.

However, there are some sharp edges when you try to use this with a backend that DOESN'T distinguish between "A" and "a", such as duckdb:

import ibis

t = ibis.memtable({"a": [1, 2, 3]})
t = t.mutate(A=t.a)
t.cache()
Details
---------------------------------------------------------------------------
CatalogException                          Traceback (most recent call last)
Cell In[2], line 7
      5 t = ibis.memtable({"a": [1, 2, 3]})
      6 t = t.mutate(A=t.a)
----> 7 t.cache()

File ~/code/ibis/ibis/expr/types/relations.py:3531, in Table.cache(self)
   3458 """Cache the provided expression.
   3459 
   3460 All subsequent operations on the returned expression will be performed
   (...)
   3528 └─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘
   3529 """
   3530 current_backend = self._find_backend(use_default=True)
-> 3531 return current_backend._cached_table(self)

File ~/code/ibis/ibis/backends/__init__.py:772, in CacheHandler._cached_table(self, table)
    770 entry = self._cache_op_to_entry.get(table.op())
    771 if entry is None or (cached_op := entry.cached_op_ref()) is None:
--> 772     cached_op = self._create_cached_table(util.gen_name("cached"), table).op()
    773     entry = CacheEntry(
    774         table.op(),
    775         weakref.ref(cached_op),
   (...)
    778         ),
    779     )
    780     self._cache_op_to_entry[table.op()] = entry

File ~/code/ibis/ibis/backends/__init__.py:805, in CacheHandler._create_cached_table(self, name, expr)
    804 def _create_cached_table(self, name: str, expr: ir.Table) -> ir.Table:
--> 805     return self.create_table(name, expr, schema=expr.schema(), temp=True)

File ~/code/ibis/ibis/backends/duckdb/__init__.py:197, in Backend.create_table(self, name, obj, schema, database, temp, overwrite)
    193 # This is the same table as initial_table unless overwrite == True
    194 final_table = sg.table(
    195     name, catalog=catalog, db=database, quoted=self.compiler.quoted
    196 )
--> 197 with self._safe_raw_sql(create_stmt) as cur:
    198     if query is not None:
    199         insert_stmt = sge.insert(query, into=initial_table).sql(self.name)

File ~/.pyenv/versions/3.11.6/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/code/ibis/ibis/backends/duckdb/__init__.py:319, in Backend._safe_raw_sql(self, *args, **kwargs)
    317 @contextlib.contextmanager
    318 def _safe_raw_sql(self, *args, **kwargs):
--> 319     yield self.raw_sql(*args, **kwargs)

File ~/code/ibis/ibis/backends/duckdb/__init__.py:97, in Backend.raw_sql(self, query, **kwargs)
     95 with contextlib.suppress(AttributeError):
     96     query = query.sql(dialect=self.name)
---> 97 return self.con.execute(query, **kwargs)

CatalogException: Catalog Error: Column with name A already exists!

related issues, though I don't think exactly the same:

  • https://github.com/ibis-project/ibis/issues/9853
  • https://github.com/ibis-project/ibis/issues/6772

IDK, this may also just be a bug with the duckdb backend, not a more general feature request.

What is the motivation behind your request?

I had a bug in my code where I accidentally created both columns "A" and "a". Then, I got this cryptic error later.

Describe the solution you'd like

Not sure exactly what to do here, but a few ideas:

  • If a table is bound to a backend that doesn't distinguish between upper/lower (eg duckdb), could we fail earlier? Probably not a good idea, I would think we want all expressions to be portable between backends.
  • catch this error whenever we actually interact with the backend, and give a nicer error there. I think this would be the solution I prefer.

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb, not sure the story on other backends.

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

NickCrews avatar Sep 24 '24 23:09 NickCrews