beam icon indicating copy to clipboard operation
beam copied to clipboard

WIP: Use Batched DoFns in DataFrame convert utilities

Open TheNeuralBit opened this issue 3 years ago • 4 comments

TODO:

  • [x] Get tests passing
  • [x] Clarify separation of concerns between pandas_type_compatibility and dataframe.schemas
  • [x] Address TODOs (mostly error string comments)
  • [x] Remove duplicated logic in UnbatchPandas and BatchRowsAsDataFrame (these should defer to BatchConverters)

Note that both https://github.com/apache/beam/pull/22626 and https://github.com/apache/beam/pull/22630 were extracted from this PR to for ease of review.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI.

TheNeuralBit avatar Aug 03 '22 18:08 TheNeuralBit

Run Python 3.8 PostCommit

TheNeuralBit avatar Aug 03 '22 18:08 TheNeuralBit

Run Python 3.7 PostCommit

TheNeuralBit avatar Aug 04 '22 22:08 TheNeuralBit

Codecov Report

Merging #22575 (c088431) into master (63ba9c7) will decrease coverage by 0.01%. The diff coverage is 93.36%.

@@            Coverage Diff             @@
##           master   #22575      +/-   ##
==========================================
- Coverage   74.19%   74.17%   -0.02%     
==========================================
  Files         709      712       +3     
  Lines       93499    93802     +303     
==========================================
+ Hits        69367    69582     +215     
- Misses      22855    22943      +88     
  Partials     1277     1277              
Flag Coverage Δ
python 83.53% <93.36%> (-0.06%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/typehints/__init__.py 77.77% <66.66%> (-22.23%) :arrow_down:
sdks/python/apache_beam/dataframe/schemas.py 96.62% <92.30%> (-1.05%) :arrow_down:
sdks/python/apache_beam/dataframe/convert.py 91.20% <93.47%> (+0.83%) :arrow_up:
...apache_beam/typehints/pandas_type_compatibility.py 94.95% <94.95%> (ø)
sdks/python/apache_beam/typehints/batch.py 90.38% <100.00%> (+1.99%) :arrow_up:
...examples/inference/sklearn_mnist_classification.py 43.75% <0.00%> (-3.75%) :arrow_down:
sdks/python/apache_beam/internal/metrics/metric.py 93.00% <0.00%> (-1.00%) :arrow_down:
sdks/python/apache_beam/io/localfilesystem.py 90.97% <0.00%> (-0.76%) :arrow_down:
...hon/apache_beam/runners/direct/test_stream_impl.py 93.28% <0.00%> (-0.75%) :arrow_down:
sdks/python/apache_beam/typehints/schemas.py 93.84% <0.00%> (-0.48%) :arrow_down:
... and 25 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 04 '22 22:08 codecov[bot]

Clarify separation of concerns between pandas_type_compatibility and dataframe.schemas

dataframe.schemas:

  • Maintain its current public API (possibly with deprecation notices)
  • Responsible for making proxies for the DataFrame API

typehints.pandas_type_compatibility:

  • pandas-Beam type mapping
  • BatchConverter implementations

TheNeuralBit avatar Aug 04 '22 23:08 TheNeuralBit

CC: @robertwb

TheNeuralBit avatar Aug 12 '22 22:08 TheNeuralBit

Run Python 3.8 PostCommit

TheNeuralBit avatar Aug 12 '22 23:08 TheNeuralBit

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @y1chi for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Aug 13 '22 00:08 github-actions[bot]

Reminder, please take a look at this pr: @y1chi

github-actions[bot] avatar Aug 20 '22 12:08 github-actions[bot]

@y1chi do you have time to review this?

TheNeuralBit avatar Aug 22 '22 16:08 TheNeuralBit

R: @yeandy

TheNeuralBit avatar Aug 24 '22 23:08 TheNeuralBit

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

github-actions[bot] avatar Aug 24 '22 23:08 github-actions[bot]

Run Python 3.8 PostCommit

TheNeuralBit avatar Aug 31 '22 01:08 TheNeuralBit

Run Python Examples_Direct

TheNeuralBit avatar Aug 31 '22 01:08 TheNeuralBit

Run Python Examples_Dataflow

TheNeuralBit avatar Aug 31 '22 01:08 TheNeuralBit

retest this please

TheNeuralBit avatar Aug 31 '22 14:08 TheNeuralBit

retest this please

TheNeuralBit avatar Aug 31 '22 17:08 TheNeuralBit

Run Python Examples_Direct

TheNeuralBit avatar Aug 31 '22 17:08 TheNeuralBit

Run Python Examples_Dataflow

TheNeuralBit avatar Aug 31 '22 17:08 TheNeuralBit

Run Python 3.8 PostCommit

TheNeuralBit avatar Aug 31 '22 17:08 TheNeuralBit

PythonDocs PreCommit has passed (https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Commit/9575/), merging

TheNeuralBit avatar Aug 31 '22 21:08 TheNeuralBit