Add Histogram transform and combiner
Develop Histogram combiner and a transform that efficiently constructs linear, exponential or explicit histograms from large datasets of input data within an Apache Beam pipeline. Also, another objective is that the combiner can be used with the Group transform.
The input data for the combiner can be of any Number generic type, and the combiner returns the list of histogram bucket counts as a result. During creation, bucket type can be specified, along with the bucket sizes based on the parameters like width and/or growth factor.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [ ] Mention the appropriate issue in your description (for example:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead. - [ ] Update
CHANGES.mdwith noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.
See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 71.40%. Comparing base (
cbc480e) to head (60f1672). Report is 3 commits behind head on master.
:exclamation: Current head 60f1672 differs from pull request most recent head d76263a
Please upload reports for the commit d76263a to get more accurate results.
Additional details and impacted files
@@ Coverage Diff @@
## master #31379 +/- ##
============================================
+ Coverage 71.37% 71.40% +0.02%
Complexity 1474 1474
============================================
Files 900 900
Lines 114202 114180 -22
Branches 1076 1076
============================================
+ Hits 81515 81529 +14
+ Misses 30659 30623 -36
Partials 2028 2028
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
assign set of reviewers
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:
R: @kennknowles for label java. R: @Abacn for label build.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
What is the status of this PR. I see @tilgalas reviewed it. If the change looks good to you, reviewer could approve it and I can help merge
actually, the tests are not exercised currently. Need to add
dependsOn(":sdks:java:extensions:combiners:build")
to
https://github.com/apache/beam/blob/a944bf87cd03d32105d87fc986ecba5b656683bc/build.gradle.kts#L284
* What went wrong:
Execution failed for task ':sdks:java:extensions:combiners:analyzeClassesDependencies'.
> Dependency analysis found issues.
usedUndeclaredArtifacts
- org.apache.commons:commons-lang3:3.14.0@jar
need to add implementation library.java.commons_lang3 in combiners/build.gradle
after that you can also run ./gradlew :sdks:java:extensions:combiners:build locally to check if build now pass
different task failed in different trials of Java PreCommit, not related to the PR, merging for now
PS: in the future when address comment please avoid rebase-and/or-force-push which erases the linear history of the change. Just push another commit and reviewer can just check the latest changes and not need to get back to the whole code