river icon indicating copy to clipboard operation
river copied to clipboard

JobGetAvailable call has no timeout

Open bgentry opened this issue 3 months ago • 1 comments

Was skimming through this to debug a customer issue. Realized that we explicitly remove all cancellation from the context, which is intentional because it's derived from work context and we don't want to halt an in-progress fetch during shutdown because we may have already locked/updated those jobs and would need to cleanly release them to avoid them needing to be rescued. With the pro pilot, we have a retry loop with individualized per-attempt timeouts applied. However, for the standard pilot, there are no additional timeouts applied at all to this method:

https://github.com/riverqueue/river/blob/ed11967c1f9d22bab3d9eb8a642f4ee807ddee22/producer.go#L735-L764

What's your thought @brandur? Should we apply a fixed timeout to these fetches in the standard pilot so that it doesn't hang indefinitely (freezing the producer and/or blocking shutdown), or something else?

bgentry avatar Sep 03 '25 14:09 bgentry

No strong opinion.

As I'm sure I've said a few times before, I don't recall a single time that timeouts have ever saved us from a production issue. We have very, very few high-level timeouts in our code and things are fine — in my experience, an extremely high level fallback timeout, or something at a low level, will always bubble up with an error and prevent infinite blocking. I know you've said you've experienced otherwise, but this is one I'll have to see myself to believe. I also don't love them because they make debugging very difficult. They don't produce distinct error messages, so if you get one you have to hunt down where it came from (i.e. which timeout caused this problem? there's no way to tell) and that tends to be difficult. Wrapping every operation in its own timeout also makes code quite verbose and ugly.

I guess it would make sense here to make the standard pilot consistent with the pro pilot, so if that seems to make things a little better here, seems okay.

brandur avatar Sep 04 '25 02:09 brandur