spark
spark copied to clipboard
[SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0
What changes were proposed in this pull request?
This PR proposes to upgrade Pandas to 2.2.0.
See What's new in 2.2.0 (January 19, 2024)
Why are the changes needed?
Pandas 2.2.0 is released, and we should support the latest Pandas.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
The existing CI should pass
Was this patch authored or co-authored using generative AI tooling?
No.
Yeah, Pandas fixes many bugs from Pandas 2.2.0 that brings couple of behavior changes 😢
Let me fix them. Thanks for the confirm!
I believe now this PR completed to address all of Pandas 2.2.0 behavior. cc @HyukjinKwon @dongjoon-hyun FYI
- Is the change of python/pyspark/pandas/resample.py safe?
It breaks the previous behavior, so if we plan to release other minor release (Spark 3.6.0) this should not be included.
- What happens when the users decide to use old Pandas (<= 2.2.0)?
Using deprecated aliases (Y, M, H, T, S) wouldn't work.
We should not bring any breaking change. Let me address them.
Thanks, @dongjoon-hyun for double checking.
Oh, wait.
I just remembered that we just follow the Pandas behavior and separately mention the breaking changes into release note.
- In Spark 4.0, it is recommended to use Pandas version 2.0.0 or above with PySpark for optimal compatibility.
- In Spark 4.0, the minimum supported version for Pandas has been raised from 1.0.5 to 1.4.4 in PySpark.
...
- In Spark 4.0, when applying astype to a decimal type object, the existing missing value is changed to True instead of False from Pandas API on Spark.
- In Spark 4.0, pyspark.testing.assertPandasOnSparkEqual has been removed from Pandas API on Spark, use pyspark.pandas.testing.assert_frame_equal instead.
So maybe we should add a release note instead of reverting the breaking changes here? @dongjoon-hyun @HyukjinKwon
Just updated to resample work in old Pandas as well.
I think we can just make it as deprecate for now to avoid breaking the existing pipeline. (Also updated the release note)
Merged to master.
Thank you again, @itholic and @HyukjinKwon .
Great work @itholic Thank you :)
Thank you so much all for the review!