arcade Set "TimeToLive" on Helix Workitem messages

We often get in the state where queues are so backed up that by the time they start the work, the build has long vanished. But we waste time running them anyway. Which then causes future work to ALSO be so backed up as to be pointless.

We could pretty easily use Expiration when queueing a helix workitem to be close to the build's timeout, so that we aren't running work after the user is long stopped caring. It should just be a few lines of code to enable this behavior.

This would cause the spikes of usage on the highly contentious queues to be self-correcting (without having to wait for days of backed up, pointless items to fully execute).

Sep 19 '22 21:09 ChadNedzlek

I like this idea. We'd probably want to make this optional for users who can afford to wait indefinitely and want to.

Sep 19 '22 21:09 MattGal

Yeah, I sort of assumed it would just be an additional (optional) parameter to the new job API, and if it's set, we slap it on all the messages we create.

Sep 19 '22 22:09 ChadNedzlek

I'm not sure where we'd want to fund this work. Maybe #6948? It's not... a horrible place for it. And it's sort of in line with "actually understand what helix is doing". Thoughts @markwilkie as the shepherd of that epic?

Oct 13 '22 21:10 ChadNedzlek

Yep - shared test infra works for me.

Oct 14 '22 00:10 markwilkie