[datafusion-spark] Implement ceil&floor function for spark
Which issue does this PR close?
- Closes #15916
Rationale for this change
Add Spark-style ceil&floor function, the differences between DataFusion and Spark's implementations of mathematical functions ceil and floor:
-
Function Signatures:
- Spark provides a more flexible implementation with
ceil(x, scale)accepting two parameters - DataFusion only implements the single parameter version
ceil(x)
- Spark provides a more flexible implementation with
-
Type Support:
- Spark has broader type support, handling INTEGER, DECIMAL, DOUBLE, and FLOAT types
- DataFusion currently only supports Float32 and Float64 types
What changes are included in this PR?
Add new ceil&floor function.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.
@irenjj Here are some tests for ceil and floor that you can re-use. Sail adheres to Spark SQL syntax, so you may need to adjust as needed for this pr:
- https://github.com/lakehq/sail/blob/main/python/pysail/tests/spark/function/test_ceil.txt
- https://github.com/lakehq/sail/blob/main/python/pysail/tests/spark/function/test_floor.txt
@irenjj I'll review this by tomorrow!
I'm not a committer though, so we'll still need a review from someone else as well. cc @alamb @andygrove
Hi @irenjj. Could you explain in the PR description how these functions differ from the standard DataFusion implementation (i.e., what is Spark-specific about them)? That will help with reviews.
@parthchandra @huaxingao @comphead @mbutrovich @kazuyukitanimura, FYI, in case you want to review. In Comet, we are currently using DataFusion's ceil and floor functions and have not specialized for Spark.
@irenjj - I wonder if you would be willing to help pick this PR back up now that we have merged a PR with a bunch of tests from @shehabgamin here:
- https://github.com/apache/datafusion/pull/16409
We are hoping to generate some example PRs to use when starting to fill out the rest of the functions and I was thinking this one might be a good one
@irenjj - I wonder if you would be willing to help pick this PR back up now that we have merged a PR with a bunch of tests from @shehabgamin here:
We are hoping to generate some example PRs to use when starting to fill out the rest of the functions and I was thinking this one might be a good one
Sure! I need some time to figure out why new added ceil/floor only works with one argument(it seems datafusion has an original implementation for ceil/floor), i'll do this work next week(little busy this week) -- Sorry for dragging out this PR for so long @shehabgamin @andygrove
@irenjj - I wonder if you would be willing to help pick this PR back up now that we have merged a PR with a bunch of tests from @shehabgamin here:
We are hoping to generate some example PRs to use when starting to fill out the rest of the functions and I was thinking this one might be a good one
Sure! I need some time to figure out why new added ceil/floor only works with one argument(it seems datafusion has an original implementation for ceil/floor), i'll do this work next week(little busy this week) -- Sorry for dragging out this PR for so long @shehabgamin @andygrove
Thanks @irenjj -- no worries about the timing. I know you are busy and are also on the leading edge of these spark functions (so the process of implementation is a bit rough). Thank you for helping us figure it out
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.