deps: update to ibis-framework 9.x and newer sqlglot
Note: Creating as a Draft PR now, as all unit tests are passing, but I know I need to make updates for the custom operators to get the system tests passing.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)
Fixes internal issue 350749011 🦕
Getting
pytest tests/system/small/test_dataframe.py::test_dataframe_agg_int_single_string
...
../../envs/bigframes/lib/python3.11/site-packages/sqlglot/dialects/bigquery.py:53: in <listcomp>
exp.PropertyEQ(this=exp.to_identifier(name), expression=fld)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = Column(
this=Identifier(this=col_18, quoted=True)), quoted = None, copy = True
def to_identifier(name, quoted=None, copy=True):
"""Builds an identifier.
Args:
name: The name to turn into an identifier.
quoted: Whether to force quote the identifier.
copy: Whether to copy name if it's an Identifier.
Returns:
The identifier ast node.
"""
if name is None:
return None
if isinstance(name, Identifier):
identifier = maybe_copy(name, copy)
elif isinstance(name, str):
identifier = Identifier(
this=name,
quoted=not SAFE_IDENTIFIER_RE.match(name) if quoted is None else quoted,
)
else:
> raise ValueError(f"Name needs to be a string or an Identifier, got: {name.__class__}")
E ValueError: Name needs to be a string or an Identifier, got: <class 'sqlglot.expressions.Column'>
../../envs/bigframes/lib/python3.11/site-packages/sqlglot/expressions.py:6798: ValueError
for DataFrame.agg and related tests like describe. Might have something to do with how we're unnesting memtables. Will have to investigate further.
Re: FAILED tests/system/small/test_pandas.py::test_get_dummies_dataframe[kwargs2] - AssertionError: DataFrame.iloc[:, 11] (column name="time_col.11:14:34.701606") are different
Looks like the time scalar is losing microsecond precision.
COALESCE(`t0`.`time_col` = time(11, 14, 34), FALSE) AS `col_15`,
COALESCE(`t0`.`time_col` = time(11, 41, 43), FALSE) AS `col_17`,
COALESCE(`t0`.`time_col` = time(12, 0, 0), FALSE) AS `col_19`,
COALESCE(`t0`.`time_col` = time(15, 16, 17), FALSE) AS `col_21`,
COALESCE(`t0`.`time_col` = time(23, 59, 59), FALSE) AS `col_23`,
Edit: looks like an ibis or maybe sqlglot bug. Filed https://github.com/ibis-project/ibis/issues/9609
Marking as do not merge because to fix 3.9, we'll need to vendor ibis 9.x into third_party.
Reading through some of the review here, I would really love for y'all to chime in more proactively on the Ibis issue tracker.
I understand that contributing to Ibis isn't top priority, but it might help ease some of the pain here if we had some advance notice about any difficulty a particular change might create. I also realize that may not be knowable until you actually do the work to upgrade to a later version of Ibis.
Generally speaking, if there's a decision that would cause lots of headaches, or that we expect might, we'd like to discuss and get feedback on it.
Thanks for your input, @cpcloud. We appreciate you taking the time to engage with us on this thread. We generally aim to report issues that are relevant to the wider Ibis community directly in the issue tracker, so everyone can benefit from the discussion and resolution. However, we also recognize that some issues might be specific to our use case, and in those cases, we'll handle them internally. We value your insights and want to ensure we're collaborating effectively. Are there any particular issues you've observed in our usage of Ibis that you'd like to discuss further? Your feedback helps us to better understand your priorities and identify areas for improvement.