datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

[datafusion-spark] Implement ceil&floor function for spark

Open irenjj opened this issue 7 months ago • 7 comments

Which issue does this PR close?

  • Closes #15916

Rationale for this change

Add Spark-style ceil&floor function, the differences between DataFusion and Spark's implementations of mathematical functions ceil and floor:

  1. Function Signatures:

    • Spark provides a more flexible implementation with ceil(x, scale) accepting two parameters
    • DataFusion only implements the single parameter version ceil(x)
  2. Type Support:

    • Spark has broader type support, handling INTEGER, DECIMAL, DOUBLE, and FLOAT types
    • DataFusion currently only supports Float32 and Float64 types

What changes are included in this PR?

Add new ceil&floor function.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

irenjj avatar May 06 '25 08:05 irenjj

@irenjj Here are some tests for ceil and floor that you can re-use. Sail adheres to Spark SQL syntax, so you may need to adjust as needed for this pr:

  • https://github.com/lakehq/sail/blob/main/python/pysail/tests/spark/function/test_ceil.txt
  • https://github.com/lakehq/sail/blob/main/python/pysail/tests/spark/function/test_floor.txt

shehabgamin avatar May 06 '25 21:05 shehabgamin

@irenjj I'll review this by tomorrow!

I'm not a committer though, so we'll still need a review from someone else as well. cc @alamb @andygrove

shehabgamin avatar May 07 '25 22:05 shehabgamin

Hi @irenjj. Could you explain in the PR description how these functions differ from the standard DataFusion implementation (i.e., what is Spark-specific about them)? That will help with reviews.

andygrove avatar May 07 '25 23:05 andygrove

@parthchandra @huaxingao @comphead @mbutrovich @kazuyukitanimura, FYI, in case you want to review. In Comet, we are currently using DataFusion's ceil and floor functions and have not specialized for Spark.

andygrove avatar May 07 '25 23:05 andygrove

@irenjj - I wonder if you would be willing to help pick this PR back up now that we have merged a PR with a bunch of tests from @shehabgamin here:

  • https://github.com/apache/datafusion/pull/16409

We are hoping to generate some example PRs to use when starting to fill out the rest of the functions and I was thinking this one might be a good one

alamb avatar Jun 17 '25 22:06 alamb

@irenjj - I wonder if you would be willing to help pick this PR back up now that we have merged a PR with a bunch of tests from @shehabgamin here:

We are hoping to generate some example PRs to use when starting to fill out the rest of the functions and I was thinking this one might be a good one

Sure! I need some time to figure out why new added ceil/floor only works with one argument(it seems datafusion has an original implementation for ceil/floor), i'll do this work next week(little busy this week) -- Sorry for dragging out this PR for so long @shehabgamin @andygrove

irenjj avatar Jun 17 '25 23:06 irenjj

@irenjj - I wonder if you would be willing to help pick this PR back up now that we have merged a PR with a bunch of tests from @shehabgamin here:

We are hoping to generate some example PRs to use when starting to fill out the rest of the functions and I was thinking this one might be a good one

Sure! I need some time to figure out why new added ceil/floor only works with one argument(it seems datafusion has an original implementation for ceil/floor), i'll do this work next week(little busy this week) -- Sorry for dragging out this PR for so long @shehabgamin @andygrove

Thanks @irenjj -- no worries about the timing. I know you are busy and are also on the leading edge of these spark functions (so the process of implementation is a bit rough). Thank you for helping us figure it out

alamb avatar Jun 18 '25 10:06 alamb

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Aug 19 '25 02:08 github-actions[bot]