beam
beam copied to clipboard
WIP: Use Batched DoFns in DataFrame convert utilities
TODO:
- [x] Get tests passing
- [x] Clarify separation of concerns between pandas_type_compatibility and dataframe.schemas
- [x] Address TODOs (mostly error string comments)
- [x] Remove duplicated logic in
UnbatchPandasandBatchRowsAsDataFrame(these should defer to BatchConverters)
Note that both https://github.com/apache/beam/pull/22626 and https://github.com/apache/beam/pull/22630 were extracted from this PR to for ease of review.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.
Run Python 3.8 PostCommit
Run Python 3.7 PostCommit
Codecov Report
Merging #22575 (c088431) into master (63ba9c7) will decrease coverage by
0.01%. The diff coverage is93.36%.
@@ Coverage Diff @@
## master #22575 +/- ##
==========================================
- Coverage 74.19% 74.17% -0.02%
==========================================
Files 709 712 +3
Lines 93499 93802 +303
==========================================
+ Hits 69367 69582 +215
- Misses 22855 22943 +88
Partials 1277 1277
| Flag | Coverage Δ | |
|---|---|---|
| python | 83.53% <93.36%> (-0.06%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| sdks/python/apache_beam/typehints/__init__.py | 77.77% <66.66%> (-22.23%) |
:arrow_down: |
| sdks/python/apache_beam/dataframe/schemas.py | 96.62% <92.30%> (-1.05%) |
:arrow_down: |
| sdks/python/apache_beam/dataframe/convert.py | 91.20% <93.47%> (+0.83%) |
:arrow_up: |
| ...apache_beam/typehints/pandas_type_compatibility.py | 94.95% <94.95%> (ø) |
|
| sdks/python/apache_beam/typehints/batch.py | 90.38% <100.00%> (+1.99%) |
:arrow_up: |
| ...examples/inference/sklearn_mnist_classification.py | 43.75% <0.00%> (-3.75%) |
:arrow_down: |
| sdks/python/apache_beam/internal/metrics/metric.py | 93.00% <0.00%> (-1.00%) |
:arrow_down: |
| sdks/python/apache_beam/io/localfilesystem.py | 90.97% <0.00%> (-0.76%) |
:arrow_down: |
| ...hon/apache_beam/runners/direct/test_stream_impl.py | 93.28% <0.00%> (-0.75%) |
:arrow_down: |
| sdks/python/apache_beam/typehints/schemas.py | 93.84% <0.00%> (-0.48%) |
:arrow_down: |
| ... and 25 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Clarify separation of concerns between pandas_type_compatibility and dataframe.schemas
dataframe.schemas:
- Maintain its current public API (possibly with deprecation notices)
- Responsible for making proxies for the DataFrame API
typehints.pandas_type_compatibility:
- pandas-Beam type mapping
- BatchConverter implementations
CC: @robertwb
Run Python 3.8 PostCommit
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:
R: @y1chi for label python.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
Reminder, please take a look at this pr: @y1chi
@y1chi do you have time to review this?
R: @yeandy
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control
Run Python 3.8 PostCommit
Run Python Examples_Direct
Run Python Examples_Dataflow
retest this please
retest this please
Run Python Examples_Direct
Run Python Examples_Dataflow
Run Python 3.8 PostCommit
PythonDocs PreCommit has passed (https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Commit/9575/), merging