Set "TimeToLive" on Helix Workitem messages
We often get in the state where queues are so backed up that by the time they start the work, the build has long vanished. But we waste time running them anyway. Which then causes future work to ALSO be so backed up as to be pointless.
We could pretty easily use Expiration when queueing a helix workitem to be close to the build's timeout, so that we aren't running work after the user is long stopped caring. It should just be a few lines of code to enable this behavior.
This would cause the spikes of usage on the highly contentious queues to be self-correcting (without having to wait for days of backed up, pointless items to fully execute).
I like this idea. We'd probably want to make this optional for users who can afford to wait indefinitely and want to.
Yeah, I sort of assumed it would just be an additional (optional) parameter to the new job API, and if it's set, we slap it on all the messages we create.
I'm not sure where we'd want to fund this work. Maybe #6948? It's not... a horrible place for it. And it's sort of in line with "actually understand what helix is doing". Thoughts @markwilkie as the shepherd of that epic?
Yep - shared test infra works for me.