Daft icon indicating copy to clipboard operation
Daft copied to clipboard

More informative user-facing errors for SQL parsing of joins

Open jaychia opened this issue 4 months ago • 0 comments

Describe the bug

I am trying to do this:

import daft

df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]})
df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]})

result_df = daft.sql("SELECT * FROM df1 JOIN df2 ON df1.a = df2.a")
result_df.show()

However, as I was building this query, I ran some variations which gave me some error messages that could be more informative.

  1. daft.sql("SELECT * FROM df1 JOIN df2"):
---------------------------------------------------------------------------
InvalidSQLException                       Traceback (most recent call last)
Cell In[6], line 6
      3 df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]})
      4 df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]})
----> 6 result_df = daft.sql("SELECT * FROM df1 JOIN df2")
      7 result_df.show()

    [... skipping hidden 2 frame]

File ~/code/Daft/daft/sql/sql.py:81, in sql(sql, catalog, register_globals)
     78 planning_config = get_context().daft_planning_config
     80 _py_catalog = catalog._catalog
---> 81 _py_logical = _sql(sql, _py_catalog, planning_config)
     82 return DataFrame(LogicalPlanBuilder(_py_logical))

InvalidSQLException: Unsupported SQL: 'join type: Inner(None)'

This one made me think that I needed INNER JOIN somehow, but that wasn't the case. I was missing my join predicate.

  1. daft.sql("SELECT * FROM df1 JOIN df2 ON a"):
---------------------------------------------------------------------------
InvalidSQLException                       Traceback (most recent call last)
Cell In[7], line 6
      3 df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]})
      4 df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]})
----> 6 result_df = daft.sql("SELECT * FROM df1 JOIN df2 ON a")
      7 result_df.show()

    [... skipping hidden 2 frame]

File ~/code/Daft/daft/sql/sql.py:81, in sql(sql, catalog, register_globals)
     78 planning_config = get_context().daft_planning_config
     80 _py_catalog = catalog._catalog
---> 81 _py_logical = _sql(sql, _py_catalog, planning_config)
     82 return DataFrame(LogicalPlanBuilder(_py_logical))

InvalidSQLException: Unsupported SQL: 'JOIN clauses support '=' constraints combined with 'AND'; found expression = Identifier(Ident { value: "a", quote_style: None })'

To Reproduce

No response

Expected behavior

No response

Component(s)

SQL

Additional context

No response

jaychia avatar Sep 26 '24 18:09 jaychia