beam icon indicating copy to clipboard operation
beam copied to clipboard

Changes CoGroupByKey typehint from List to Iterable

Open ryanthompson591 opened this issue 2 years ago • 15 comments

Changes CoGroupByKey to output an Iterable instead of a List.

This will be a breaking change and I put together a document to describe it.

The documentation of CoGroupByKey is very specific that it returns iterable and not a lists.

fixes #21556


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Choose reviewer(s) and mention them in a comment (R: @username).
  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI.

ryanthompson591 avatar Sep 01 '22 00:09 ryanthompson591

R: tvalentyn

I want to talk about this before I submit. The coGroupByKey implementation specifically does make a dictionary with a Key to list pairing.

However, the documentation is very explicit that users should expect an iterable.

I want to double check one last time we are not making a breaking change unless we have to.

ryanthompson591 avatar Sep 01 '22 00:09 ryanthompson591

Codecov Report

Merging #22984 (4420e58) into master (ebacef9) will decrease coverage by 0.08%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #22984      +/-   ##
==========================================
- Coverage   73.58%   73.50%   -0.09%     
==========================================
  Files         716      718       +2     
  Lines       95311    95438     +127     
==========================================
+ Hits        70138    70148      +10     
- Misses      23877    23989     +112     
- Partials     1296     1301       +5     
Flag Coverage Δ
python 83.18% <ø> (-0.23%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/transforms/util.py 96.06% <ø> (ø)
sdks/go/pkg/beam/io/databaseio/writer.go 0.00% <0.00%> (-31.43%) :arrow_down:
sdks/go/pkg/beam/core/state/state.go 80.85% <0.00%> (-2.75%) :arrow_down:
...s/python/apache_beam/io/gcp/bigquery_file_loads.py 87.24% <0.00%> (-0.47%) :arrow_down:
...hon/apache_beam/runners/worker/bundle_processor.py 93.42% <0.00%> (-0.13%) :arrow_down:
...thon/apache_beam/ml/inference/sklearn_inference.py 95.45% <0.00%> (-0.07%) :arrow_down:
sdks/go/pkg/beam/util/gcsx/gcs.go 27.41% <0.00%> (ø)
sdks/go/pkg/beam/artifact/stage.go 61.87% <0.00%> (ø)
sdks/go/pkg/beam/io/filesystem/util.go 96.29% <0.00%> (ø)
sdks/go/pkg/beam/io/databaseio/database.go 24.24% <0.00%> (ø)
... and 14 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 01 '22 00:09 codecov[bot]

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @TheNeuralBit for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Sep 01 '22 01:09 github-actions[bot]

R: @tvalentyn CC: @robertwb

TheNeuralBit avatar Sep 01 '22 16:09 TheNeuralBit

@ryanthompson591 I can't comment on your doc so I'll mention here - it encourages users to materialize the iterable result to a list, which can lead to OOMs. I think it's ok to mention casting to a list as a last resort, but the doc should also encourage users to iterate over the iterable without materializing it if at all possible.

TheNeuralBit avatar Sep 01 '22 16:09 TheNeuralBit

Run Python PreCommit

TheNeuralBit avatar Sep 01 '22 16:09 TheNeuralBit

@TheNeuralBit Thanks I'll update the doc to reflect that, discouraging changing the iterable to a list.

ryanthompson591 avatar Sep 01 '22 17:09 ryanthompson591

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

github-actions[bot] avatar Sep 01 '22 17:09 github-actions[bot]

Fixes issue https://github.com/apache/beam/issues/21556

ryanthompson591 avatar Sep 13 '22 14:09 ryanthompson591

Fixes issue #21556

I think for this to take effect it has to be in the PR description (which you can edit to include it) - https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

damccorm avatar Sep 13 '22 15:09 damccorm

@lukecwik I updated the CHANGES.md file. I think that since the 2.42 release is cut, it is now a good time to add this change.

ryanthompson591 avatar Sep 13 '22 15:09 ryanthompson591

Run Python 3.9 PostCommit

ryanthompson591 avatar Sep 13 '22 15:09 ryanthompson591

Thanks for the comments, I think I address the major concerns and added some new sample code.

Feel free to request edit access to that doc.

ryanthompson591 avatar Sep 13 '22 17:09 ryanthompson591

Run Python 3.9 PostCommit

ryanthompson591 avatar Sep 13 '22 17:09 ryanthompson591

Run Python PreCommit

ryanthompson591 avatar Sep 19 '22 21:09 ryanthompson591

Run Portable_Python PreCommit

ryanthompson591 avatar Sep 26 '22 19:09 ryanthompson591

LGTM

tvalentyn avatar Sep 27 '22 02:09 tvalentyn