beam icon indicating copy to clipboard operation
beam copied to clipboard

Add Histogram transform and combiner

Open dilnazanlid opened this issue 1 year ago • 1 comments

Develop Histogram combiner and a transform that efficiently constructs linear, exponential or explicit histograms from large datasets of input data within an Apache Beam pipeline. Also, another objective is that the combiner can be used with the Group transform.

The input data for the combiner can be of any Number generic type, and the combiner returns the list of histogram bucket counts as a result. During creation, bucket type can be specified, along with the bucket sizes based on the parameters like width and/or growth factor.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

dilnazanlid avatar May 23 '24 06:05 dilnazanlid

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar May 23 '24 07:05 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.40%. Comparing base (cbc480e) to head (60f1672). Report is 3 commits behind head on master.

:exclamation: Current head 60f1672 differs from pull request most recent head d76263a

Please upload reports for the commit d76263a to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #31379      +/-   ##
============================================
+ Coverage     71.37%   71.40%   +0.02%     
  Complexity     1474     1474              
============================================
  Files           900      900              
  Lines        114202   114180      -22     
  Branches       1076     1076              
============================================
+ Hits          81515    81529      +14     
+ Misses        30659    30623      -36     
  Partials       2028     2028              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 13 '24 16:06 codecov[bot]

assign set of reviewers

dilnazanlid avatar Jun 18 '24 11:06 dilnazanlid

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java. R: @Abacn for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Jun 18 '24 11:06 github-actions[bot]

What is the status of this PR. I see @tilgalas reviewed it. If the change looks good to you, reviewer could approve it and I can help merge

Abacn avatar Jun 18 '24 18:06 Abacn

actually, the tests are not exercised currently. Need to add

 dependsOn(":sdks:java:extensions:combiners:build")

to

https://github.com/apache/beam/blob/a944bf87cd03d32105d87fc986ecba5b656683bc/build.gradle.kts#L284

Abacn avatar Jun 24 '24 17:06 Abacn

* What went wrong:
Execution failed for task ':sdks:java:extensions:combiners:analyzeClassesDependencies'.
> Dependency analysis found issues.
  usedUndeclaredArtifacts
   - org.apache.commons:commons-lang3:3.14.0@jar

need to add implementation library.java.commons_lang3 in combiners/build.gradle

after that you can also run ./gradlew :sdks:java:extensions:combiners:build locally to check if build now pass

Abacn avatar Jun 25 '24 20:06 Abacn

different task failed in different trials of Java PreCommit, not related to the PR, merging for now

PS: in the future when address comment please avoid rebase-and/or-force-push which erases the linear history of the change. Just push another commit and reviewer can just check the latest changes and not need to get back to the whole code

Abacn avatar Jun 26 '24 19:06 Abacn