tools
tools copied to clipboard
Pub/Sub topic and subscription creation timeout
The issue I am seeing is that pubsubc fails to create topics + subscriptions when the container starts up, and when this happens I can just see Operation timed out in the logs. When this happens it doesn't retry or anything. I'm guessing this happens when the server takes too long to start. Here the pubsub client is run in the background before the actual server starts so I guess there is a race.
If I run pubsubc manually after the topics are created fine.
I'm running the image in minikube using onechart.
Can't track this one down at all! As far as I can tell, the issues started with gcloud v407: any version greater than that is causing issues.
I've updated the autobuild script to attach the gcloud version number, so pinning to thekevjames/gcloud-pubsub-emulator:406.0.0 will let folks avoid this issue, but no dice on figuring out a real fix.
If anyone can make use of a more recent tag than 406.0.0 and figure out what's going on here, or even provide more info to help with debugging, that'd be much appreciated!
Fixed! thekevjames/gcloud-pubsub-emulator:420.0.0 and future should be working once more.
Unfortunately, I did need to switch off of alpine and onto Debian, so the images are (hopefully temporarily?) much larger (~2GB), but that's about the only option I had.
@TheKevJames sorry I wasn't checking updates to this. I've tried out 420.0.0 but the problem I'm seeing now is that the readiness port that gets opened up when the topics and subscriptions have been created is not responding to requests. Any requests on that port just hang indefinitely, this even happens when I exec into the container and curl localhost:8682.
In the logs I see:
configuring project: my-proj
- creating topic: my-topic
- creating subscription: my-sub
Done building projects/topics/subscriptions! Opening readiness port...
So the topic and sub are getting created, but then when I do the curls no response comes back and I can see in the logs:
GET / HTTP/1.1
Host: 172.17.0.5:8682
User-Agent: kube-probe/1.23
Accept: */*
Connection: close
GET / HTTP/1.1
Host: localhost:8682
User-Agent: curl/7.74.0
Accept: */*
So I can see it is logging k8s readiness probe requests and the requests I am sending from within the container.
I've also just seen the original problem happen again, only seems to happen when we run the container on our Jenkins instance though:
Executing: /usr/lib/google-cloud-sdk/platform/pubsub-emulator/bin/cloud-pubsub-emulator --host=0.0.0.0 --port=8681
[pubsub] This is the Google Pub/Sub fake.
[pubsub] Implementation may be incomplete or differ from the real system.
[pubsub] Apr 12, 2023 10:46:35 AM com.google.cloud.pubsub.testing.v1.Main main
[pubsub] INFO: IAM integration is disabled. IAM policy methods and ACL checks are not supported
[pubsub] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[pubsub] SLF4J: Defaulting to no-operation (NOP) logger implementation
[pubsub] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Operation timed out
[pubsub] Apr 12, 2023 10:46:44 AM com.google.cloud.pubsub.testing.v1.Main main
[pubsub] INFO: Server started, listening on 8681
I think it would be a good idea to add some resiliency to calls that are creating the topics and subscriptions. As I'm pretty sure that in this case a simple retry would do the trick.
i'll try and find some time to play around with it
Interesting! I've so far been unable to replicate the issues you're reporting above, so any additional information you can find would be very helpful!