[Bug]: Go SDK Dataflow jobs fail on DataSampling disabled
What happened?
Dataflow is making DataSampling FnAPI requests even when DataSampling is disabled. But since the feature wasn't enabled, the Go SDK isn't initialzing the datasampler, leading to nil pointer panic.
2023-12-08 16:28:11.597 PST
panic({0x1264220?, 0x2786ff0?})
2023-12-08 16:28:11.597 PST
runtime/panic.go:914 +0x21f
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*DataSampler).getAllSamples(0x0)
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec/datasampler.go:79 +0x4b
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.(*DataSampler).GetSamples(0x1312c00?, {0x0?, 0xc000234480?, 0xc00053b508?})
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec/datasampler.go:62 +0x1d
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc000318000, {0x180beb0?, 0xc000091c80?}, 0xc000147ef0)
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/harness.go:668 +0x116f
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness.MainWithOptions.func4({0x180beb0?, 0xc000091c80?}, 0xc000091c80?)
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/harness.go:202 +0x74
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness.MainWithOptions({0x180beb0, 0xc000091c20}, {0x7fffd16304e4, 0xf}, {0x7fffd1630507, 0xf}, {{0xc000225e70, 0x1, 0x1}, {0xc000046010, ...}})
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/harness.go:222 +0x1022
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/init.hook()
2023-12-08 16:28:11.597 PST
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness/init/init.go:144 +0x50c
Easy enough fix, not caught sooner because we didn't run the Dataflow Go Postcommits.
First: Fix the issue, and cherry pick it into 2.53.0 Second: While this is very unlikely, this would have been caught by a simple Dataflow Go Wordcount test as a pre-commit. I'll add that.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
- [ ] Component: Python SDK
- [ ] Component: Java SDK
- [X] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [X] Component: Google Cloud Dataflow Runner
cc: @rohdesamuel @zechenj18
Thanks for opening this!
Waiting for https://github.com/apache/beam/actions/runs/7201639105 to complete, and once that's verified, I'll make a cherry pick PR for it for the release.
Last outstanding thing here is the Dataflow Smoke test precommit.
This is non-blocking but I've punted it's release over.
Can we drop this to P2?
Agreed.
It seems this issue was resolved? The linked workflow run https://github.com/apache/beam/issues/29760#issuecomment-1854793020 was successful