beam icon indicating copy to clipboard operation
beam copied to clipboard

[Do not Merge] [WIP] Combine.perKeyWithBucketing: Combiner for reducing key cardinality

Open arunpandianp opened this issue 1 year ago • 1 comments

Combine.perKeyWithBucketing(childCombiner, numBuckets) applies the child combiner to the PCollection using numberOfBuckets number of intermediate keys.

This is a POC, sending it now to share and get early feedback.

TODO: Add tests, pick better names.

Example usage

input.apply(Combine.perKeyWithBucketing(yourCombineFn, desiredNumKeys))

arunpandianp avatar Oct 17 '24 07:10 arunpandianp

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 58.93%. Comparing base (1e27978) to head (768981a). Report is 386 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #32831      +/-   ##
============================================
+ Coverage     57.41%   58.93%   +1.52%     
- Complexity     1475     3102    +1627     
============================================
  Files           968     1131     +163     
  Lines        154224   174643   +20419     
  Branches       1076     3330    +2254     
============================================
+ Hits          88546   102931   +14385     
- Misses        63477    68373    +4896     
- Partials       2201     3339    +1138     
Flag Coverage Δ
java 69.91% <ø> (+1.33%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 17 '24 08:10 codecov[bot]

Closing this in favor of https://github.com/apache/beam/pull/33318

arunpandianp avatar Dec 06 '24 22:12 arunpandianp