beam icon indicating copy to clipboard operation
beam copied to clipboard

[BEAM-6394] Add support to write protobuf data using ProtoParquetReader

Open cyberbeam524 opened this issue 10 months ago • 17 comments

Please add a meaningful description for your change here

  • fixes #19366
  • better handling of parquet‑protobuf data by switching to ProtoParquetReader, and clearer configuration via display data

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

cyberbeam524 avatar Feb 25 '25 07:02 cyberbeam524

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar Feb 25 '25 07:02 github-actions[bot]

assign set of reviewers

cyberbeam524 avatar Feb 25 '25 07:02 cyberbeam524

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java. R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Feb 25 '25 07:02 github-actions[bot]

@Abacn Could you please take a look? Thanks!

cyberbeam524 avatar Feb 25 '25 07:02 cyberbeam524

Reminder, please take a look at this pr: @kennknowles @johnjcasey

github-actions[bot] avatar Mar 15 '25 12:03 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Mar 19 '25 12:03 github-actions[bot]

Reminder, please take a look at this pr: @robertwb @Abacn

github-actions[bot] avatar Mar 27 '25 12:03 github-actions[bot]

waiting on author

Abacn avatar Mar 27 '25 14:03 Abacn

@Abacn Added the following changes according to your comments. Could you please take a look? Thank you!

  • Added ReaderFormat enum to ParquetIO.ReadFiles
  • Replaced boolean useProtoReader flag with a public ReaderFormat enum (AVRO, PROTO)
  • Default to ReaderFormat.AVRO for backward compatibility
  • Added withAvroReader() and withProtoReader() builder methods
  • RefactoredSplitReadFn to switch on ReaderFormat and dispatch to Avro or Proto reader
  • Removed"parquet.proto.ignore.unknown.fields" setting from the default path
  • Update DisplayData to emit readerFormat.name()
  • Update unit tests to assert on ReaderFormat and display data accordingly

cyberbeam524 avatar Apr 25 '25 18:04 cyberbeam524

Reminder, please take a look at this pr: @robertwb @Abacn

github-actions[bot] avatar May 03 '25 12:05 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar May 07 '25 12:05 github-actions[bot]

Reminder, please take a look at this pr: @kennknowles @Abacn

github-actions[bot] avatar May 15 '25 12:05 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @ahmedabu98 for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar May 19 '25 12:05 github-actions[bot]

Reminder, please take a look at this pr: @ahmedabu98 @Abacn

github-actions[bot] avatar May 27 '25 12:05 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar May 30 '25 12:05 github-actions[bot]

Reminder, please take a look at this pr: @kennknowles @Abacn

github-actions[bot] avatar Jun 07 '25 12:06 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @ahmedabu98 for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Jun 11 '25 12:06 github-actions[bot]

Reminder, please take a look at this pr: @ahmedabu98 @Abacn

github-actions[bot] avatar Jun 19 '25 12:06 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Jun 23 '25 12:06 github-actions[bot]

Reminder, please take a look at this pr: @robertwb @Abacn

github-actions[bot] avatar Jul 01 '25 12:07 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @chamikaramj for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Jul 04 '25 12:07 github-actions[bot]

Reminder, please take a look at this pr: @chamikaramj @Abacn

github-actions[bot] avatar Jul 11 '25 12:07 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Jul 16 '25 12:07 github-actions[bot]

Reminder, please take a look at this pr: @robertwb @Abacn

github-actions[bot] avatar Jul 24 '25 12:07 github-actions[bot]

@Abacn could you please take another look?

damccorm avatar Jul 25 '25 13:07 damccorm

Reminder, please take a look at this pr: @robertwb @Abacn

github-actions[bot] avatar Aug 02 '25 12:08 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @ahmedabu98 for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Aug 06 '25 12:08 github-actions[bot]

Hi @ahmedabu98 and @Abacn, please review when you have a chance. Thanks!

derrickaw avatar Aug 12 '25 14:08 derrickaw

Reminder, please take a look at this pr: @ahmedabu98 @Abacn

github-actions[bot] avatar Aug 20 '25 12:08 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @chamikaramj for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Aug 25 '25 12:08 github-actions[bot]