beam
beam copied to clipboard
[Feature Request]: SpannerIO: support max commit delay
What would you like to happen?
Spanner supports setting a max commit delay for throughput optimized writes, but there is no way to set this in SpannerIO.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- [ ] Component: Python SDK
- [X] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
.take-issue
Commit deadlines for writes have been supported since 2020 at the RPC level and can be set in 3 ways:
- using Write.withCommitDeadline()
- using SpannerConfig.withCommitDeadline()
- using SpannerConfig.withCommitRetrySettings()
The default is 15seconds with retry and backoff, because with very long commit deadlines, a pipeline can push spanner into an overload situation, and reduce overall throughput.
@nielm oh, I believe that is a completely different feature, this feature request is about throughput optimized writes: documentation link.
It was only released on March 26th 2024, a couple of weeks ago: release notes link
Yes, setting commit timeouts on individual transactions is a relatively new feature in the spanner client libraries. However in Beam, you can only set the timeout on the entire Write transform, which is then the same as the existing RPC commit deadline parameter.
In addition, there is a dependency here - the maximum commit delay will always be the RPC commit deadline.
So while we appreciate the contribution, it is not necessary, as the existing parameter has the same effect.