Daft
Daft copied to clipboard
More informative user-facing errors for SQL parsing of joins
Describe the bug
I am trying to do this:
import daft
df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]})
df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]})
result_df = daft.sql("SELECT * FROM df1 JOIN df2 ON df1.a = df2.a")
result_df.show()
However, as I was building this query, I ran some variations which gave me some error messages that could be more informative.
-
daft.sql("SELECT * FROM df1 JOIN df2")
:
---------------------------------------------------------------------------
InvalidSQLException Traceback (most recent call last)
Cell In[6], line 6
3 df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]})
4 df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]})
----> 6 result_df = daft.sql("SELECT * FROM df1 JOIN df2")
7 result_df.show()
[... skipping hidden 2 frame]
File ~/code/Daft/daft/sql/sql.py:81, in sql(sql, catalog, register_globals)
78 planning_config = get_context().daft_planning_config
80 _py_catalog = catalog._catalog
---> 81 _py_logical = _sql(sql, _py_catalog, planning_config)
82 return DataFrame(LogicalPlanBuilder(_py_logical))
InvalidSQLException: Unsupported SQL: 'join type: Inner(None)'
This one made me think that I needed INNER JOIN
somehow, but that wasn't the case. I was missing my join predicate.
-
daft.sql("SELECT * FROM df1 JOIN df2 ON a")
:
---------------------------------------------------------------------------
InvalidSQLException Traceback (most recent call last)
Cell In[7], line 6
3 df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]})
4 df2 = daft.from_pydict({"a": [1, 2, 3], "c": ["daft", None, None]})
----> 6 result_df = daft.sql("SELECT * FROM df1 JOIN df2 ON a")
7 result_df.show()
[... skipping hidden 2 frame]
File ~/code/Daft/daft/sql/sql.py:81, in sql(sql, catalog, register_globals)
78 planning_config = get_context().daft_planning_config
80 _py_catalog = catalog._catalog
---> 81 _py_logical = _sql(sql, _py_catalog, planning_config)
82 return DataFrame(LogicalPlanBuilder(_py_logical))
InvalidSQLException: Unsupported SQL: 'JOIN clauses support '=' constraints combined with 'AND'; found expression = Identifier(Ident { value: "a", quote_style: None })'
To Reproduce
No response
Expected behavior
No response
Component(s)
SQL
Additional context
No response