[Bug]: pubsubio in Go SDK incorrectly sets Topic when Subscription is specified
What happened?
When streaming data from PubSub, there are two broad options:
- Create a new subscription on a topic
- Use an existing subscription for a topic
These are mutually exclusive.
However, the pubsubio package does not guarantee this exclusion - as can be seen here, if ReadOptions contains a non-empty Subscription, this is set on the existing payload which already has the Topic set. This violates the guarantees that the comments on the pipepb.PubSubReadPayload assume - that "Exactly one of topic or subscription should be set.".
In practice, when used in a Dataflow job, I observe that a new subscription is created - it seems the fact that Topic is specified means approach 1 (above) wins out.
If I make the following change to the pubsubio code I observe the behaviour I expect when running in Dataflow (no subscription created except for the tracker subscription if custom timestamps are used):
if opts != nil {
payload.IdAttribute = opts.IDAttribute
payload.TimestampAttribute = opts.TimestampAttribute
if opts.Subscription != "" {
payload.Topic = ""
payload.Subscription = pubsubx.MakeQualifiedSubscriptionName(project, opts.Subscription)
}
payload.WithAttributes = opts.WithAttributes
}
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- [ ] Component: Python SDK
- [ ] Component: Java SDK
- [x] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [x] Component: Google Cloud Dataflow Runner
.take-issue