beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: pubsubio in Go SDK incorrectly sets Topic when Subscription is specified

Open tgw-thirdfort opened this issue 6 months ago • 1 comments

What happened?

When streaming data from PubSub, there are two broad options:

  1. Create a new subscription on a topic
  2. Use an existing subscription for a topic

These are mutually exclusive.

However, the pubsubio package does not guarantee this exclusion - as can be seen here, if ReadOptions contains a non-empty Subscription, this is set on the existing payload which already has the Topic set. This violates the guarantees that the comments on the pipepb.PubSubReadPayload assume - that "Exactly one of topic or subscription should be set.".

In practice, when used in a Dataflow job, I observe that a new subscription is created - it seems the fact that Topic is specified means approach 1 (above) wins out.

If I make the following change to the pubsubio code I observe the behaviour I expect when running in Dataflow (no subscription created except for the tracker subscription if custom timestamps are used):

if opts != nil {
		payload.IdAttribute = opts.IDAttribute
		payload.TimestampAttribute = opts.TimestampAttribute
		if opts.Subscription != "" {
			payload.Topic = ""
			payload.Subscription = pubsubx.MakeQualifiedSubscriptionName(project, opts.Subscription)
		}
		payload.WithAttributes = opts.WithAttributes
	}

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • [ ] Component: Python SDK
  • [ ] Component: Java SDK
  • [x] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Infrastructure
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [x] Component: Google Cloud Dataflow Runner

tgw-thirdfort avatar Jun 15 '25 18:06 tgw-thirdfort

.take-issue

TanuSharma2511 avatar Jun 16 '25 05:06 TanuSharma2511