Golang SpannerIO Implementation
Basic SpannerIO implementation as described in https://github.com/apache/beam/issues/23181.
Todo
- [ ] Additional unit tests
- [ ] Integration tests
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [ ] Choose reviewer(s) and mention them in a comment (
R: @username). - [ ] Mention the appropriate issue in your description (for example:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead. - [ ] Update
CHANGES.mdwith noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.
See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:
R: @riteshghorse for label go. R: @Abacn for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
Reminder, please take a look at this pr: @riteshghorse @Abacn
I'll take a look at this sometime next week. I'm a bit bandwidth constrained until then due to being the 2.42.0 release manager and preparing for my GopherCon talk on the SDK for next Friday. That said, this should be able to make it into 2.43.0. Thank you for your patience.
The RAT failure is because we need license string at the start of each file.
Reminder, please take a look at this pr: @riteshghorse @Abacn
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @jrmccluskey for label go. R: @chamikaramj for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Reminder, please take a look at this pr: @jrmccluskey @chamikaramj
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @damccorm for label go. R: @johnjcasey for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Please add the license string, and respond to Rebo's comments
waiting on author
Apologies for the delay in getting back to this - bandwidth has been a bit constrained on my side. I've corrected the feedback items.
Question re: integration tests. I would like to write an integration test but loath to throw a spanner emulator into the mix, even the inmemory https://github.com/googleapis/google-cloud-go/tree/main/spanner/spannertest. Any advice please?
Codecov Report
Merging #23285 (44a1fcc) into master (cf56af2) will decrease coverage by
0.02%. The diff coverage is52.58%.
@@ Coverage Diff @@
## master #23285 +/- ##
==========================================
- Coverage 73.46% 73.43% -0.03%
==========================================
Files 714 715 +1
Lines 96497 96613 +116
==========================================
+ Hits 70889 70947 +58
- Misses 24286 24335 +49
- Partials 1322 1331 +9
| Flag | Coverage Δ | |
|---|---|---|
| go | 51.45% <52.58%> (-0.01%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| sdks/go/pkg/beam/io/spannerio/spanner.go | 52.58% <52.58%> (ø) |
|
| sdks/go/pkg/beam/core/metrics/dumper.go | 49.20% <0.00%> (-4.77%) |
:arrow_down: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
My recommendation here is to not worry about a full on integration test for now. If you've run this on a portable runner (like Flink or Cloud Dataflow), and it works, that's sufficient E2E verification for now for example.
I'd much prefer a robust in-memory unit test of the ProcessLogic though, and that we can do by refactoring the logic to be more unit testable, in separate stand alone functions. The pipeline construction logic is a bit hairy to test outside of running pipelines, or writing elaborate pre-set up (creating a spanner instance, and tearing it down, etc). Trying to connect to a test database is also tricky because typically it requires everything to be in memory, or to have something arbitrary distributed runners can connect to (which won't be true on Dataflow, for example).
The "simple" way to test this is to migrate the DoFn code that calls the client into testable functions, and pass those functions the client. This allows unit testing the important logic of calling the spanner APIs, if not the beam specific logic of setting up the client.
This is what we do to test the pubsubx "helper" logic for example: https://github.com/apache/beam/blob/3c5ea0dfd500c2f6b97eaaf4e39612c406afc9f5/sdks/go/pkg/beam/util/pubsubx/pubsub_test.go which is used to set up some of the streaming examples against pubsub.
So, we can set some things up, like query(ctx context.Context, client spanner.Client, query string, rt reflect.Type, emit func(beam.X))
Then you can write a test that sets up the the spannertest client with the data, and passes in a closure like func(v beam.X) { values = append(values, v) }, allowing us to check all the values, and then that will validate most of that logic.
And you can validate you're getting the expected types out too. And similarly for the writing variant.
The spannertest has code you can borrow for setting up the client data properly: https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L127 and https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L212
I haven't got a great advice for this in general, as most effort for IOs is hard focused on Java, and outside of simple style guidelines, there's not much in the way of "This is the comprehensive way to write a testable IO" that I could translate from Java to Go for you. And it's not something I feel I should "wing".
This all LGTM! Thanks for persevering, and welcome to Beam!