DataflowTemplates
DataflowTemplates copied to clipboard
Remove LocalSpannerIO
Fixes #1362
Action item of https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1424#discussion_r1560010027
Affected PRs are notified: #1392 #1376
Target to merge once Template upgrade to Beam 2.56.0
Codecov Report
Attention: Patch coverage is 0%
with 13 lines
in your changes are missing coverage. Please review.
Project coverage is 41.64%. Comparing base (
24409db
) to head (db2d527
). Report is 15 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1429 +/- ##
============================================
+ Coverage 40.21% 41.64% +1.43%
+ Complexity 2796 2779 -17
============================================
Files 738 729 -9
Lines 42790 41637 -1153
Branches 4577 4457 -120
============================================
+ Hits 17207 17340 +133
+ Misses 24104 22818 -1286
Partials 1479 1479
Components | Coverage Δ | |
---|---|---|
spanner-templates | 59.04% <0.00%> (+2.17%) |
:arrow_up: |
spanner-import-export | 65.59% <0.00%> (ø) |
|
spanner-live-forward-migration | 70.96% <ø> (+7.82%) |
:arrow_up: |
spanner-live-reverse-replication | 48.42% <ø> (+5.46%) |
:arrow_up: |
spanner-bulk-migration | 77.69% <ø> (+7.63%) |
:arrow_up: |
Files | Coverage Δ | |
---|---|---|
...ogle/cloud/teleport/spanner/ApplyDDLTransform.java | 0.00% <0.00%> (ø) |
|
...google/cloud/teleport/spanner/ExportTransform.java | 17.76% <0.00%> (ø) |
|
...com/google/cloud/teleport/spanner/ReadDialect.java | 0.00% <0.00%> (ø) |
|
.../cloud/teleport/spanner/ReadInformationSchema.java | 0.00% <0.00%> (ø) |
|
...google/cloud/teleport/templates/SpannerToText.java | 0.00% <0.00%> (ø) |
|
...leport/templates/SpannerVectorEmbeddingExport.java | 0.00% <0.00%> (ø) |
|
...le/cloud/teleport/spanner/TextImportTransform.java | 44.67% <0.00%> (ø) |
|
...d/teleport/templates/common/SpannerConverters.java | 75.91% <0.00%> (ø) |
|
...google/cloud/teleport/spanner/ImportTransform.java | 26.29% <0.00%> (ø) |
I want to chime in here: I think this is the right change to make. It dramatically simplifies our infrastructure and it is generally a much better practice to have a single source of truth, rather than have 2 different similar but not quite the same IOs. More urgently, we need to fix the conflict problem mentioned in https://github.com/GoogleCloudPlatform/DataflowTemplates/issues/1362 - it is not reasonable to expect the release manager to dedicate special time to this IO every release, which is what the current setup requires.
However, I also recognize that this fork was done for a reason, and this change introduces challenges. Waiting on Beam changes to roll out is slower and means it takes fixes longer to roll out. In the short term, merging this would also regress https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1392 and https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1376. I'm not sure how urgent those are, but it would certainly be a cost. In general, I'd probably prefer we at least merge this alongside the next Beam release/templates upgrade to avoid that problem.
Moving forward, I think we have a few options:
- Merge this PR with the next Beam release (preferred IMO)
- Find another way to fix #1362 so that it doesn't require release manager intervention to upgrade the Beam version every time.
- Do nothing - IMO this is a bad option though because of the maintenance burden it imposes, and if we do this we'd likely at least require the spanner team to be more involved in Beam version upgrades
Currently test fail with error
Error message from worker: java.lang.IllegalArgumentException: Unknown spanner type FLOAT32
needs upcoming Beam release to include https://github.com/apache/beam/pull/30893
I agree that this something we need to get done. This is something we wanted to go to towards the later half of the year. LocalSpannerIO
got added in the first place due to Beam's release schedule being slower than the release velocity the Spanner team desires in launching new features to the customers.
@damccorm / @Abacn - How do other teams handle this today? I would like to understand if there is a fundamental reason why Spanner needs to this differently, or if other services do not encounter similar release velocity issues.
Can you please let me know what support is needed from the Spanner team to get this through?
There are some implementation gaps between SpannerIO in Beam and LocalSpannerIO, which need to be covered to ensure we don't create a parity difference between the two. @darshan-sj Can you please evaluate?
In LocalSpannerAccessor line number - 44 to 56, there are some retry settings which are set for ExecuteStreamingSQL method. These are required for Databoost feature to work properly. They are not present in Apache Beam and need to be moved to SpannerAccessor in Apache Beam. I would suggest going with Option 2 from @damccorm 's comment for now. Then evaluate the diff properly and remove LocalSpannerIO once the diff is migrated to Beam and that version is consumed.
Regarding Beam's release schedule, @damccorm / @Abacn How do other teams deal with this issue? Apache Beam takes 2+ months for release.
How do other teams handle this today? I would like to understand if there is a fundamental reason why Spanner needs to this differently, or if other services do not encounter similar release velocity issues.
I think this is somewhat of an unsolved problem. Releasing Beam more often may eventually be the answer, but we're not ready for that yet. Existing approaches generally are:
- Something close to what you are doing
- Deal with the slow release cycle
In general, IOs tend to be mostly stable for teams, so (2) is acceptable for most. It sounds like that isn't true for your team - what is driving your velocity needs here? (I recognize that faster releases are just generally better, but I'm also trying to understand if there are things beyond that, e.g. a big feature, etc...).
Teams will also sometimes go with option (2), and then temporarily fork the code if an urgent issue comes up
Can you please let me know what support is needed from the Spanner team to get this through?
I think we need to drive one of the options above - either:
- Merge this PR with the next Beam release - this would also require fixing the gaps @darshan-sj called out in Beam head (which I think we should do no matter what)
or
- Find another way to fix #1362 so that it doesn't require release manager intervention to upgrade the Beam version every time.
I'm actually fine with either - I think (1) is less painful long term, but I'm ok with your team making that tradeoff if you want to go with option (2). If you decide to go that route, I'd ask your team to drive that fix forward.
In the very short term: lets make sure that the local Spanner IO is totally in sync with the one at Beam head so that the next release upgrade is painless.
Note that the real issue here is #1362 - LocalSpannerIO is under org.apache.beam.io.spanner namespace. This has two issues
- Some conflicting class name. This is dangerous. If the classpath order changes the result is unpredictive. It shouldn't be fine in any case.
- This triggers an AutoValue bug https://github.com/google/auto/issues/1734 , so one cannot just remove the conflict class
It is fine to add some logic in DataflowTemplate to make use of its faster release schedule than Beam. Actually this is one of the advantage of the Template. However, the code should under "com.google.cloud.teleport" namespace
I kindly remember that originally it used the same namespace of Beam SpannerIO because some modification is needed to access package private methods of the SpannerIO. It is not a valid workaround by hacking java access modifiers. If needed, we can have a LocalSpannerIO, as a wrapper of SpannerIO, but it needs to be designed in a way that work under Templates' own namespace.
Thank you @damccorm and @Abacn for your points. I largely agree with the points you have mentioned here. Specifically -
I'm actually fine with either - I think (1) is less painful long term, but I'm ok with your team making that tradeoff if you want to go with option (2). If you decide to go that route, I'd ask your team to drive that fix forward.
Option 2 is -
Find another way to fix [Bug]: LocalSpannerIO package namespace conflict with Beam's SpannerIO #1362 so that it doesn't require release manager intervention to upgrade the Beam version every time.
One of the possible solutions is suggested by @Abacn -
It is fine to add some logic in DataflowTemplate to make use of its faster release schedule than Beam. Actually this is one of the advantage of the Template. However, the code should under "com.google.cloud.teleport" namespace
This looks like a reasonable approach, but will require some investment to make the changes and fix the namespace conflicts as well as redesign LocalSpannerIO in a way that it can extend SpannerIO.
I am not yet sure what approach we should take here - it seems to me that Spanner needs to have an internal discussion with its feature teams to align on the following -
- Why do Spanner features need to go in more frequently compared to other services? What do we lose if we wait (Option 1)?
- Is there a technical solve (similar to one suggested by @Abacn) that can solve this problem without taking a hit on the release velocity?
I will discuss this internally and get back to you. I like option 2 but I don't think it falls under the "short term" category.
For the very short term -
In the very short term: lets make sure that the local Spanner IO is totally in sync with the one at Beam head so that the next release upgrade is painless.
@darshan-sj - Can you do the following -
- Evaluate the diff between SpannerIO and LocalSpannerIO. If the diff is only the retry settings which are added to LocalSpannerIO, go ahead and create the PR on beam so that they are both in sync.
- If there is additional differences identified, reach out to the Spanner internal feature team that created the diff in LocalSpannerIO and ask them to take up this work.
Thanks @manitgupta this plan SGTM
As dataflow template now upgraded to Beam 2.56.0, the upstream Beam now also included the newer Spanner API
https://github.com/GoogleCloudPlatform/DataflowTemplates/commits/main/v1/src/main/java/org/apache/beam/sdk/io/gcp/spanner hasn't been changed since Apr 10 (the latest commit was for Upgrade Beam version to 2.55.1) so we are save to move forward
R: @manitgupta @damccorm
Technically this looks good to me, and I agree we need to do something here. I think it is up to the Spanner folks on whether they want to move forward with this or a different direction. I'll note that my strong recommendation would be to unfork because of the tech debt forking inevitably leads to.
I will discuss this internally and get back to you. I like option 2 but I don't think it falls under the "short term" category.
@manitgupta what was the outcome of this conversation?