google-cloud-rust icon indicating copy to clipboard operation
google-cloud-rust copied to clipboard

Losing connections when using streaming pubsub

Open afajl opened this issue 9 months ago • 3 comments

We've discovered a regression when upgrading from pubsub 0.30 to 1. We use 8 workers/connections (ReceiveConfig.worker_count) to pubsub and stream messages using Subscription::receive.

We loose channels and are unable to reestablish them making our streaming grind to a halt after a couple of hours. This was never a problem with the older version that worked perfectly for months.

From time to time we get errors like this (with some added logging):

gcloud_pubsub::subscriber: reconnect - 'Status { code: Unknown, message: "error reading a body from connection", source: Some(hyper::Error(Body, Error { kind: Reset(StreamId(1), INTERNAL_ERROR, Remote) })) }' : <subscription>

From: https://github.com/yoshidan/google-cloud-rust/blame/5c30f410af0ecfb8b087fd4be4a96e4a1e6d58af/pubsub/src/subscriber.rs#L203

The crate tries to reestablish a connection on the next loop but the error from https://github.com/yoshidan/google-cloud-rust/blob/5c30f410af0ecfb8b087fd4be4a96e4a1e6d58af/googleapis/src/google.pubsub.v1.rs#L3256 is:

Status { 
  code: Cancelled, 
  message: "operation was canceled", 
  source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) 
}

Making us match this the worker exit and we loose a connection.

Anyone else seen this?

afajl avatar Apr 11 '25 08:04 afajl

The only difference between their versions is the crate name. Is there any indication that the environment other than the library has changed? https://github.com/yoshidan/google-cloud-rust/compare/v20250205...v1.0.0

yoshidan avatar Apr 15 '25 09:04 yoshidan

Sorry, I misstyped. We used these versions before:

google-cloud-pubsub = "0.21"
google-cloud-default = "0.4"
google-cloud-googleapis = "0.11"

There are no relevant changes in the environment other then that.

afajl avatar Apr 28 '25 06:04 afajl

~There is an unreleased bugfix in tonic main branch which could be involved (https://github.com/hyperium/tonic/pull/2199)~ ED: nvm, tonic 0.13.0 has this already!

It could also be because of https://github.com/hyperium/h2/issues/806#issuecomment-2817348489 (fix recently merged but unreleased), which is being surfaced by a change in tonic 0.13

You can try with git dependencies for ~both~ h2.

shikhar avatar Apr 28 '25 10:04 shikhar