datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

feat: rand expression support

Open akupchinskiy opened this issue 1 year ago • 7 comments

Which issue does this PR close?

Closes #1198

Rationale for this change

Support of the spark rand() expression

What changes are included in this PR?

  • rand() expression implementation
  • partition-awareness of the planner

How are these changes tested?

Spark compatibility tests and expression correctness test are included in the PR

akupchinskiy avatar Dec 24 '24 11:12 akupchinskiy

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 58.98%. Comparing base (f09f8af) to head (b95f4b7). Report is 282 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1199      +/-   ##
============================================
+ Coverage     56.12%   58.98%   +2.85%     
- Complexity      976     1141     +165     
============================================
  Files           119      130      +11     
  Lines         11743    12872    +1129     
  Branches       2251     2421     +170     
============================================
+ Hits           6591     7592    +1001     
- Misses         4012     4059      +47     
- Partials       1140     1221      +81     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Dec 28 '24 06:12 codecov-commenter

Thanks @akupchinskiy. I plan on reviewing this after the holidays.

andygrove avatar Dec 30 '24 03:12 andygrove

Are the partition related changes necessary for this PR? Otherwise, it might be better to reduce the scope to just the rand() expression.

mbutrovich avatar Jan 02 '25 15:01 mbutrovich

Are the partition related changes necessary for this PR? Otherwise, it might be better to reduce the scope to just the rand() expression.

There is a handful of expressions besides rand() relying on the partition index. All of them implement nondetermenistic trait providing a hook method to initialize a state before a partition evaluation for spark runtime.

Encapsulation-wise, I agree that the scope of the partition exposure should be limited. But I could not find another way to extract it other than making it a part of a planner struct.

akupchinskiy avatar Jan 05 '25 08:01 akupchinskiy

@akupchinskiy do you plan to resolve the conflicts?

kazuyukitanimura avatar Mar 06 '25 01:03 kazuyukitanimura

@akupchinskiy do you plan to resolve the conflicts?

Yeah, thanks for the reminder. Will do it tomorrow

akupchinskiy avatar Mar 06 '25 18:03 akupchinskiy

@kazuyukitanimura could you trigger the workflow?

akupchinskiy avatar Mar 07 '25 16:03 akupchinskiy

I upmerged this PR and re-triggered the workflows. Sorry for the delay @akupchinskiy

andygrove avatar Jun 24 '25 17:06 andygrove