beam
beam copied to clipboard
Changes CoGroupByKey typehint from List to Iterable
Changes CoGroupByKey to output an Iterable instead of a List.
This will be a breaking change and I put together a document to describe it.
The documentation of CoGroupByKey is very specific that it returns iterable and not a lists.
fixes #21556
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [ ] Choose reviewer(s) and mention them in a comment (
R: @username
). - [ ] Mention the appropriate issue in your description (for example:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead. - [ ] Update
CHANGES.md
with noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.
See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.
R: tvalentyn
I want to talk about this before I submit. The coGroupByKey implementation specifically does make a dictionary with a Key to list pairing.
However, the documentation is very explicit that users should expect an iterable.
I want to double check one last time we are not making a breaking change unless we have to.
Codecov Report
Merging #22984 (4420e58) into master (ebacef9) will decrease coverage by
0.08%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #22984 +/- ##
==========================================
- Coverage 73.58% 73.50% -0.09%
==========================================
Files 716 718 +2
Lines 95311 95438 +127
==========================================
+ Hits 70138 70148 +10
- Misses 23877 23989 +112
- Partials 1296 1301 +5
Flag | Coverage Δ | |
---|---|---|
python | 83.18% <ø> (-0.23%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
sdks/python/apache_beam/transforms/util.py | 96.06% <ø> (ø) |
|
sdks/go/pkg/beam/io/databaseio/writer.go | 0.00% <0.00%> (-31.43%) |
:arrow_down: |
sdks/go/pkg/beam/core/state/state.go | 80.85% <0.00%> (-2.75%) |
:arrow_down: |
...s/python/apache_beam/io/gcp/bigquery_file_loads.py | 87.24% <0.00%> (-0.47%) |
:arrow_down: |
...hon/apache_beam/runners/worker/bundle_processor.py | 93.42% <0.00%> (-0.13%) |
:arrow_down: |
...thon/apache_beam/ml/inference/sklearn_inference.py | 95.45% <0.00%> (-0.07%) |
:arrow_down: |
sdks/go/pkg/beam/util/gcsx/gcs.go | 27.41% <0.00%> (ø) |
|
sdks/go/pkg/beam/artifact/stage.go | 61.87% <0.00%> (ø) |
|
sdks/go/pkg/beam/io/filesystem/util.go | 96.29% <0.00%> (ø) |
|
sdks/go/pkg/beam/io/databaseio/database.go | 24.24% <0.00%> (ø) |
|
... and 14 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer
:
R: @TheNeuralBit for label python.
Available commands:
-
stop reviewer notifications
- opt out of the automated review tooling -
remind me after tests pass
- tag the comment author after tests pass -
waiting on author
- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
R: @tvalentyn CC: @robertwb
@ryanthompson591 I can't comment on your doc so I'll mention here - it encourages users to materialize the iterable result to a list, which can lead to OOMs. I think it's ok to mention casting to a list as a last resort, but the doc should also encourage users to iterate over the iterable without materializing it if at all possible.
Run Python PreCommit
@TheNeuralBit Thanks I'll update the doc to reflect that, discouraging changing the iterable to a list.
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control
Fixes issue https://github.com/apache/beam/issues/21556
Fixes issue #21556
I think for this to take effect it has to be in the PR description (which you can edit to include it) - https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
@lukecwik I updated the CHANGES.md file. I think that since the 2.42 release is cut, it is now a good time to add this change.
Run Python 3.9 PostCommit
Thanks for the comments, I think I address the major concerns and added some new sample code.
Feel free to request edit access to that doc.
Run Python 3.9 PostCommit
Run Python PreCommit
Run Portable_Python PreCommit
LGTM