beam icon indicating copy to clipboard operation
beam copied to clipboard

SpannerIO: support max commit delay

Open kberezin-nshl opened this issue 1 year ago • 14 comments

Addresses https://github.com/apache/beam/issues/31007


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

kberezin-nshl avatar Apr 17 '24 05:04 kberezin-nshl

R: @damccorm

kberezin-nshl avatar Apr 17 '24 05:04 kberezin-nshl

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

github-actions[bot] avatar Apr 17 '24 05:04 github-actions[bot]

assign set of reviewers

kberezin-nshl avatar Apr 17 '24 06:04 kberezin-nshl

@nielm hey, sorry for tagging you explicitly here but I think I have accidentally added a wrong reviewer to this PR. Could you please help me assign the right people here?

kberezin-nshl avatar Apr 18 '24 07:04 kberezin-nshl

Effectively we already have this at the RPC level

  /** Specifies the commit deadline. This is overridden if the CommitRetrySettings is specified. */
  public SpannerConfig withCommitDeadline(Duration commitDeadline) {
    return withCommitDeadline(ValueProvider.StaticValueProvider.of(commitDeadline));
  }
    /** Specifies the commit retry settings. Setting this overrides the commit deadline. */
  public SpannerConfig withCommitRetrySettings(RetrySettings commitRetrySettings) {
    return toBuilder().setCommitRetrySettings(commitRetrySettings).build();
  }

Both of these allow the deadline for a commit to complete to be set (The default is 15s).

Setting the commit deadline to significantly longer increases the risk of Spanner going into overload and pushing back against writes, which will reduce the throughput of the pipeline.

nielm avatar Apr 18 '24 08:04 nielm

@nielm oh, I believe that is a completely different feature, this PR is about throughput optimized writes: documentation link.

It was only released on March 26th, a couple of weeks ago: release notes link

kberezin-nshl avatar Apr 18 '24 08:04 kberezin-nshl

@nielm did you have a chance to look at this again?

kberezin-nshl avatar Apr 22 '24 16:04 kberezin-nshl

The setting of the commit deadline timeout predates the per-transactiom commit delay feature, but it has the same effect - limiting the duration that the spanner server can spend processing a commit. The difference is that the commit delay can be applied to different transactions, while the commit deadline is global for a spanner client. However in beam it is not possible to set the commit delay per transaction, so it is also global.

There is no benefit to adding the commit delay to beam, when the commit deadline already exists. And the commit deadline will act as a maximum commit delay.

nielm avatar Apr 22 '24 17:04 nielm

@nielm I am sorry but this is a completely different feature. Please read the documentation link I am referring to. It was released a couple of weeks ago and we are already using it in production (we had to fork SpannerIO to be able to do so) and it improved performance of our jobs which write millions of records to Spanner dramatically.

Setting commit deadline does not affect performance of the writes at all, it just tells the client library how much time it can wait and retry the transaction, whereas setting a maximum commit delay may improve performance significantly if your app doesn't care much about the exact commit timing. Documentation quote:

If you have a latency tolerant application and want to optimize throughput, setting a longer commit delay time significantly improves throughput while incurring higher latency for each write.

kberezin-nshl avatar Apr 22 '24 17:04 kberezin-nshl

@johnjcasey may I ask you to help? I think we got a little confused here. We really need this feature merged in, as we already use it and it proven to be extremely effective.

kberezin-nshl avatar Apr 22 '24 17:04 kberezin-nshl

My apologies - I am indeed confusing it with the commit timeout delay...

Two comments on the PR

  • can you add tests in the Write test code not the Read tests to verify that the parameter is correctly passed to writeAtLeastOnce().
  • do you have any suggestion for a default value?

nielm avatar Apr 22 '24 17:04 nielm

can you add tests in the Write test code not the Read tests to verify that the parameter is correctly passed to writeAtLeastOnce()

Ah, sure. No idea why I updated Read instead of Write 🤦
Done.

do you have any suggestion for a default value?

I think by default we should just leave it out (as I implemented), because there is already a documented default behavior for this case:

If you don't set a commit delay time, Spanner might set a small delay for you if it thinks that will amortize the cost of your writes.

kberezin-nshl avatar Apr 22 '24 18:04 kberezin-nshl

There is a test failure, but I think it is unrelated to this PR:

org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest > testRampupThrottler FAILED
    java.lang.AssertionError at RampupThrottlingFnTest.java:107
        Caused by: org.mockito.exceptions.verification.TooManyActualInvocations at RampupThrottlingFnTest.java:53

kberezin-nshl avatar Apr 22 '24 19:04 kberezin-nshl

R: @chamikaramj for IO

nielm avatar Apr 23 '24 16:04 nielm

@nielm @chamikaramj Hi, can we get this in, please? It's been 1.5 months.

kberezin-nshl avatar May 29 '24 08:05 kberezin-nshl

In the failed test suite Spanner tests passed: https://ge.apache.org/s/n4qn2ounkz6b2/tests/overview

So seems like it's unrelated.

chamikaramj avatar May 30 '24 16:05 chamikaramj

Please update "CHANGES.md" separately if needed.

chamikaramj avatar May 30 '24 16:05 chamikaramj