beam icon indicating copy to clipboard operation
beam copied to clipboard

Proposal: Propagate gcs-connector options to GcsUtil

Open clairemcginty opened this issue 1 year ago • 4 comments

Context: I was reading GCS Parquet files via SplittableDoFn and noticed that ReadableFile#openSeekable does not propagate any of the gcs-connector options specified in my core-site.xml file. Particularly, I wanted to turn off fs.gs.inputstream.fast.fail.on.not.found.enable, which is redundant in a SDF with default empty-match treatment, and tweak fs.gs.inputstream.fadvise. It looks like these GoogleCloudStorageReadOptions options need to be set explicitly in GcsUtil, and passed to any GoogleCloudStorage#open calls (see reference).

The big downside of this PR is of course, pulling in Hadoop :( The alternative is to manually copy-paste all the Configuration keys manually into GcsUtil, which seems harder to maintain. Or, I could omit the GcsReadOptionsFactory factory logic entirely and leave it 100% up to the user to construct GoogleCloudStorageReadOptions instances.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

clairemcginty avatar Oct 14 '24 14:10 clairemcginty

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar Oct 14 '24 16:10 github-actions[bot]

assign set of reviewers

clairemcginty avatar Oct 14 '24 16:10 clairemcginty

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Oct 14 '24 16:10 github-actions[bot]

assigned reviewers since at first glance, the GHA failures did not look related/might be transient? could be wrong though.

clairemcginty avatar Oct 14 '24 17:10 clairemcginty

bumping this PR -- cc @scwhittle since I saw you recently made changes to GcsUtil?

clairemcginty avatar Oct 22 '24 12:10 clairemcginty

Reminder, please take a look at this pr: @damondouglas

github-actions[bot] avatar Oct 31 '24 12:10 github-actions[bot]

R: @shunping (XQ suggested you to help review this)

scwhittle avatar Nov 05 '24 09:11 scwhittle

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

github-actions[bot] avatar Nov 05 '24 09:11 github-actions[bot]

R: @shunping (XQ suggested you to help review this)

ack. will take a look today

shunping avatar Nov 05 '24 16:11 shunping

@shunping , whenever you have a chance I'd appreciate any feedback on this!

clairemcginty avatar Nov 14 '24 15:11 clairemcginty

Running the failed precommit test again, though the failure seems unrelated to the code change here.

shunping avatar Nov 26 '24 16:11 shunping

Run Java PreCommit

shunping avatar Dec 02 '24 15:12 shunping

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 57.39%. Comparing base (160ffd5) to head (27672ab). Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #32769   +/-   ##
=========================================
  Coverage     57.39%   57.39%           
  Complexity     1474     1474           
=========================================
  Files           970      970           
  Lines        154426   154426           
  Branches       1076     1076           
=========================================
  Hits          88637    88637           
  Misses        63585    63585           
  Partials       2204     2204           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 03 '24 16:12 codecov[bot]