python-bigquery-dataframes icon indicating copy to clipboard operation
python-bigquery-dataframes copied to clipboard

deps: update to ibis-framework 9.x and newer sqlglot

Open tswast opened this issue 1 year ago • 3 comments

Note: Creating as a Draft PR now, as all unit tests are passing, but I know I need to make updates for the custom operators to get the system tests passing.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • [ ] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • [ ] Ensure the tests and linter pass
  • [ ] Code coverage does not decrease (if any source code was changed)
  • [ ] Appropriate docs were updated (if necessary)

Fixes internal issue 350749011 🦕

tswast avatar Jul 08 '24 16:07 tswast

Getting

pytest tests/system/small/test_dataframe.py::test_dataframe_agg_int_single_string
...
../../envs/bigframes/lib/python3.11/site-packages/sqlglot/dialects/bigquery.py:53: in <listcomp>
    exp.PropertyEQ(this=exp.to_identifier(name), expression=fld)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

name = Column(
  this=Identifier(this=col_18, quoted=True)), quoted = None, copy = True

    def to_identifier(name, quoted=None, copy=True):
        """Builds an identifier.
    
        Args:
            name: The name to turn into an identifier.
            quoted: Whether to force quote the identifier.
            copy: Whether to copy name if it's an Identifier.
    
        Returns:
            The identifier ast node.
        """
    
        if name is None:
            return None
    
        if isinstance(name, Identifier):
            identifier = maybe_copy(name, copy)
        elif isinstance(name, str):
            identifier = Identifier(
                this=name,
                quoted=not SAFE_IDENTIFIER_RE.match(name) if quoted is None else quoted,
            )
        else:
>           raise ValueError(f"Name needs to be a string or an Identifier, got: {name.__class__}")
E           ValueError: Name needs to be a string or an Identifier, got: <class 'sqlglot.expressions.Column'>

../../envs/bigframes/lib/python3.11/site-packages/sqlglot/expressions.py:6798: ValueError

for DataFrame.agg and related tests like describe. Might have something to do with how we're unnesting memtables. Will have to investigate further.

tswast avatar Jul 09 '24 15:07 tswast

Re: FAILED tests/system/small/test_pandas.py::test_get_dummies_dataframe[kwargs2] - AssertionError: DataFrame.iloc[:, 11] (column name="time_col.11:14:34.701606") are different

Looks like the time scalar is losing microsecond precision.

  COALESCE(`t0`.`time_col` = time(11, 14, 34), FALSE) AS `col_15`,
  COALESCE(`t0`.`time_col` = time(11, 41, 43), FALSE) AS `col_17`,
  COALESCE(`t0`.`time_col` = time(12, 0, 0), FALSE) AS `col_19`,
  COALESCE(`t0`.`time_col` = time(15, 16, 17), FALSE) AS `col_21`,
  COALESCE(`t0`.`time_col` = time(23, 59, 59), FALSE) AS `col_23`,

Edit: looks like an ibis or maybe sqlglot bug. Filed https://github.com/ibis-project/ibis/issues/9609

tswast avatar Jul 15 '24 22:07 tswast

Marking as do not merge because to fix 3.9, we'll need to vendor ibis 9.x into third_party.

tswast avatar Jul 16 '24 16:07 tswast

Reading through some of the review here, I would really love for y'all to chime in more proactively on the Ibis issue tracker.

I understand that contributing to Ibis isn't top priority, but it might help ease some of the pain here if we had some advance notice about any difficulty a particular change might create. I also realize that may not be knowable until you actually do the work to upgrade to a later version of Ibis.

Generally speaking, if there's a decision that would cause lots of headaches, or that we expect might, we'd like to discuss and get feedback on it.

cpcloud avatar Sep 13 '24 18:09 cpcloud

Thanks for your input, @cpcloud. We appreciate you taking the time to engage with us on this thread. We generally aim to report issues that are relevant to the wider Ibis community directly in the issue tracker, so everyone can benefit from the discussion and resolution. However, we also recognize that some issues might be specific to our use case, and in those cases, we'll handle them internally. We value your insights and want to ensure we're collaborating effectively. Are there any particular issues you've observed in our usage of Ibis that you'd like to discuss further? Your feedback helps us to better understand your priorities and identify areas for improvement.

chelsea-lin avatar Sep 13 '24 21:09 chelsea-lin