dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG] Can't query columns with names containing `.`

Open charlesbluca opened this issue 2 years ago • 0 comments

What happened: When attempting to query a column with a name containing . (ex: Utf8("2.0")) we get a runtime error:

RuntimeError                              Traceback (most recent call last)
Cell In[1], line 9
      6 c = Context()
      7 c.create_table("df", df)
----> 9 c.sql("select * from df")

File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/context.py:517, in Context.sql(self, sql, return_futures, dataframes, gpu, config_options)
    512 else:
    513     raise RuntimeError(
    514         f"Encountered unsupported `LogicalPlan` sql type: {type(sql)}"
    515     )
--> 517 return self._compute_table_from_rel(rel, return_futures)

File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/context.py:843, in Context._compute_table_from_rel(self, rel, return_futures)
    842 def _compute_table_from_rel(self, rel: "LogicalPlan", return_futures: bool = True):
--> 843     dc = RelConverter.convert(rel, context=self)
    845     # Optimization might remove some alias projects. Make sure to keep them here.
    846     select_names = [field for field in rel.getRowType().getFieldList()]

File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/physical/rel/convert.py:61, in RelConverter.convert(cls, rel, context)
     55     raise NotImplementedError(
     56         f"No relational conversion for node type {node_type} available (yet)."
     57     )
     58 logger.debug(
     59     f"Processing REL {rel} using {plugin_instance.__class__.__name__}..."
     60 )
---> 61 df = plugin_instance.convert(rel, context=context)
     62 logger.debug(f"Processed REL {rel} into {LoggableDataFrame(df)}")
     63 return df

File ~/dev/dask-sql/bug-uppercase-column-name/dask_sql/physical/rel/logical/project.py:49, in DaskProjectPlugin.convert(self, rel, context)
     46 # shortcut: if we have a column already, there is no need to re-assign it again
     47 # this is only the case if the expr is a RexInputRef
     48 if expr.getRexType() == RexType.Reference:
---> 49     index = expr.getIndex()
     50     backend_column_name = cc.get_backend_by_frontend_index(index)
     51     logger.debug(
     52         f"Not re-adding the same column {key} (but just referencing it)"
     53     )

RuntimeError: SchemaError(FieldNotFound { field: Column { relation: Some("df"), name: "a" }, valid_fields: [Column { relation: Some("df"), name: "a.b" }] })

What you expected to happen: I would expect queries on this column to be parsed without error.

Minimal Complete Verifiable Example:

import pandas as pd
from dask_sql import Context

df = pd.DataFrame({"a.b": [1]})

c = Context()
c.create_table("df", df)

c.sql("select * from df")

Environment:

  • dask-sql version: latest main
  • Python version: 3.10
  • Operating System: ubuntu20.04
  • Install method (conda, pip, source): source

charlesbluca avatar Mar 28 '23 16:03 charlesbluca