beam
beam copied to clipboard
BigQueryIO uniformize direct and export reads
Refers to #26329, also fix #20100, #21076
When using readWithDatumReader and DIRECT_READ method, the transform would fail because the parseFn is expected. Refactor the IO so the avro datumReader can be use in both cases.
In some case, it is required to get the data with the desired schema. Currently, BQ io always uses the writer schema (or table schema). Create new APIs to set the reader schema.
This refactoring contains some breaking changes:
withFormat is not exposed anymore. Indeed, it is not possible to configure a TypedRead with a DatumReaderFactory and change the format later. Data format MUST be chosen when creating the transform.
In the TypedRead.Builder, replace the DatumReaderFactory with the BigQueryReaderFactory allowing to handle both avro and arrow in uniform fashion. This alters the BigQueryIOTranslation.
I need some help on that point to handle that in a better way.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers
assign set of reviewers
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:
R: @robertwb for label java. R: @ahmedabu98 for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
Some BQ integration tests are failing.
I don't know schema & data of the following big_query_import_export.parallel_read_table_row_xxx tables so I can recreated the setup in a personal GCP project. Can someone give me a hand ?
Reminder, please take a look at this pr: @robertwb @ahmedabu98
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @damondouglas for label java. R: @Abacn for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Thanks for the change! I agree with the direct and export reads should be similar, ideally make the migration seemlessly from one to another. Going forward, I would suggest we open a few smaller PRs incrementally, and I can take care of it for those does not involve breaking change. For those does cc @ahmedabu98 (who also assigned as reviewer) what is your thoughts? Probably could also open a thread in Beam devlist.
we can keep this PR open for reference.
I understand. I've been pulling many strings to achieve the original goal. I can split this PR in several chunks
Reminder, please take a look at this pr: @damondouglas @Abacn
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @kennknowles for label java. R: @ahmedabu98 for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Reminder, please take a look at this pr: @kennknowles @ahmedabu98
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @Abacn for label java. R: @johnjcasey for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
waiting on author
Codecov Report
Attention: Patch coverage is 62.67606% with 106 lines in your changes missing coverage. Please review.
Project coverage is 44.23%. Comparing base (
1ba33b8) to head (2b5b356). Report is 11 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #32360 +/- ##
============================================
+ Coverage 34.67% 44.23% +9.56%
- Complexity 0 3121 +3121
============================================
Files 372 730 +358
Lines 76997 105748 +28751
Branches 0 3360 +3360
============================================
+ Hits 26696 46775 +20079
- Misses 48484 55627 +7143
- Partials 1817 3346 +1529
| Flag | Coverage Δ | |
|---|---|---|
| java | 69.83% <62.67%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ahmedabu98 this has been cleaned + updated. It's ready for review
Hey @RustedBones, I'll be away for the next week or so but I can take a look when I come back
CC @Abacn if you have bandwidth to take a look
Reminder, please take a look at this pr: @Abacn @johnjcasey
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @kennknowles for label java. R: @ahmedabu98 for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Reminder, please take a look at this pr: @kennknowles @ahmedabu98
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @robertwb for label java. R: @chamikaramj for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
I did some refactoring, reducing the breaking changes and allowing an easier transform upgrade
Reminder, please take a look at this pr: @robertwb @chamikaramj
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @kennknowles for label java. R: @damondouglas for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Reminder, please take a look at this pr: @kennknowles @damondouglas
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @robertwb for label java. R: @Abacn for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
waiting on author
Looks like this is a evolving PR, please let me know when it's ready. Also, good to split independent fix out so it expedites review. Thank you!
This is not evolving anymore. I rebased many times the PR to re-trigger the CI, in the hope to get all green checks in CI, without success. Seems most of the time the error is unrelated to the changes.
Will merge master in the future. It will be visible on your end that no changes are added.
Reminder, please take a look at this pr: @robertwb @Abacn
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @kennknowles for label java. R: @ahmedabu98 for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)