dask-sql
dask-sql copied to clipboard
[BUG] Can't query columns with names containing `.`
What happened:
When attempting to query a column with a name containing . (ex: Utf8("2.0")) we get a runtime error:
RuntimeError Traceback (most recent call last)
Cell In[1], line 9
6 c = Context()
7 c.create_table("df", df)
----> 9 c.sql("select * from df")
File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/context.py:517, in Context.sql(self, sql, return_futures, dataframes, gpu, config_options)
512 else:
513 raise RuntimeError(
514 f"Encountered unsupported `LogicalPlan` sql type: {type(sql)}"
515 )
--> 517 return self._compute_table_from_rel(rel, return_futures)
File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/context.py:843, in Context._compute_table_from_rel(self, rel, return_futures)
842 def _compute_table_from_rel(self, rel: "LogicalPlan", return_futures: bool = True):
--> 843 dc = RelConverter.convert(rel, context=self)
845 # Optimization might remove some alias projects. Make sure to keep them here.
846 select_names = [field for field in rel.getRowType().getFieldList()]
File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/physical/rel/convert.py:61, in RelConverter.convert(cls, rel, context)
55 raise NotImplementedError(
56 f"No relational conversion for node type {node_type} available (yet)."
57 )
58 logger.debug(
59 f"Processing REL {rel} using {plugin_instance.__class__.__name__}..."
60 )
---> 61 df = plugin_instance.convert(rel, context=context)
62 logger.debug(f"Processed REL {rel} into {LoggableDataFrame(df)}")
63 return df
File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/physical/rel/logical/project.py:49, in DaskProjectPlugin.convert(self, rel, context)
46 # shortcut: if we have a column already, there is no need to re-assign it again
47 # this is only the case if the expr is a RexInputRef
48 if expr.getRexType() == RexType.Reference:
---> 49 index = expr.getIndex()
50 backend_column_name = cc.get_backend_by_frontend_index(index)
51 logger.debug(
52 f"Not re-adding the same column {key} (but just referencing it)"
53 )
RuntimeError: SchemaError(FieldNotFound { field: Column { relation: Some("df"), name: "a" }, valid_fields: [Column { relation: Some("df"), name: "a.b" }] })
What you expected to happen: I would expect queries on this column to be parsed without error.
Minimal Complete Verifiable Example:
import pandas as pd
from dask_sql import Context
df = pd.DataFrame({"a.b": [1]})
c = Context()
c.create_table("df", df)
c.sql("select * from df")
Environment:
- dask-sql version: latest
main - Python version: 3.10
- Operating System: ubuntu20.04
- Install method (conda, pip, source): source