beam icon indicating copy to clipboard operation
beam copied to clipboard

Golang SpannerIO Implementation

Open christogav opened this issue 3 years ago • 1 comments

Basic SpannerIO implementation as described in https://github.com/apache/beam/issues/23181.

Todo

  • [ ] Additional unit tests
  • [ ] Integration tests

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Choose reviewer(s) and mention them in a comment (R: @username).
  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI.

christogav avatar Sep 18 '22 03:09 christogav

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @riteshghorse for label go. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Sep 18 '22 04:09 github-actions[bot]

Reminder, please take a look at this pr: @riteshghorse @Abacn

github-actions[bot] avatar Sep 28 '22 12:09 github-actions[bot]

I'll take a look at this sometime next week. I'm a bit bandwidth constrained until then due to being the 2.42.0 release manager and preparing for my GopherCon talk on the SDK for next Friday. That said, this should be able to make it into 2.43.0. Thank you for your patience.

lostluck avatar Sep 29 '22 17:09 lostluck

The RAT failure is because we need license string at the start of each file.

riteshghorse avatar Oct 06 '22 17:10 riteshghorse

Reminder, please take a look at this pr: @riteshghorse @Abacn

github-actions[bot] avatar Oct 14 '22 12:10 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label go. R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Oct 18 '22 13:10 github-actions[bot]

Reminder, please take a look at this pr: @jrmccluskey @chamikaramj

github-actions[bot] avatar Oct 26 '22 12:10 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label go. R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Oct 28 '22 12:10 github-actions[bot]

Please add the license string, and respond to Rebo's comments

johnjcasey avatar Oct 28 '22 14:10 johnjcasey

waiting on author

Abacn avatar Oct 28 '22 14:10 Abacn

Apologies for the delay in getting back to this - bandwidth has been a bit constrained on my side. I've corrected the feedback items.

Question re: integration tests. I would like to write an integration test but loath to throw a spanner emulator into the mix, even the inmemory https://github.com/googleapis/google-cloud-go/tree/main/spanner/spannertest. Any advice please?

christogav avatar Nov 06 '22 14:11 christogav

Codecov Report

Merging #23285 (44a1fcc) into master (cf56af2) will decrease coverage by 0.02%. The diff coverage is 52.58%.

@@            Coverage Diff             @@
##           master   #23285      +/-   ##
==========================================
- Coverage   73.46%   73.43%   -0.03%     
==========================================
  Files         714      715       +1     
  Lines       96497    96613     +116     
==========================================
+ Hits        70889    70947      +58     
- Misses      24286    24335      +49     
- Partials     1322     1331       +9     
Flag Coverage Δ
go 51.45% <52.58%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/io/spannerio/spanner.go 52.58% <52.58%> (ø)
sdks/go/pkg/beam/core/metrics/dumper.go 49.20% <0.00%> (-4.77%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Nov 09 '22 01:11 codecov[bot]

My recommendation here is to not worry about a full on integration test for now. If you've run this on a portable runner (like Flink or Cloud Dataflow), and it works, that's sufficient E2E verification for now for example.

I'd much prefer a robust in-memory unit test of the ProcessLogic though, and that we can do by refactoring the logic to be more unit testable, in separate stand alone functions. The pipeline construction logic is a bit hairy to test outside of running pipelines, or writing elaborate pre-set up (creating a spanner instance, and tearing it down, etc). Trying to connect to a test database is also tricky because typically it requires everything to be in memory, or to have something arbitrary distributed runners can connect to (which won't be true on Dataflow, for example).

The "simple" way to test this is to migrate the DoFn code that calls the client into testable functions, and pass those functions the client. This allows unit testing the important logic of calling the spanner APIs, if not the beam specific logic of setting up the client.

This is what we do to test the pubsubx "helper" logic for example: https://github.com/apache/beam/blob/3c5ea0dfd500c2f6b97eaaf4e39612c406afc9f5/sdks/go/pkg/beam/util/pubsubx/pubsub_test.go which is used to set up some of the streaming examples against pubsub.

So, we can set some things up, like query(ctx context.Context, client spanner.Client, query string, rt reflect.Type, emit func(beam.X))

Then you can write a test that sets up the the spannertest client with the data, and passes in a closure like func(v beam.X) { values = append(values, v) }, allowing us to check all the values, and then that will validate most of that logic.

And you can validate you're getting the expected types out too. And similarly for the writing variant.

The spannertest has code you can borrow for setting up the client data properly: https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L127 and https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L212

I haven't got a great advice for this in general, as most effort for IOs is hard focused on Java, and outside of simple style guidelines, there's not much in the way of "This is the comprehensive way to write a testable IO" that I could translate from Java to Go for you. And it's not something I feel I should "wing".

lostluck avatar Nov 09 '22 01:11 lostluck

This all LGTM! Thanks for persevering, and welcome to Beam!

lostluck avatar Nov 22 '22 22:11 lostluck