nomad icon indicating copy to clipboard operation
nomad copied to clipboard

[feature]: cli job run retry on "Error fetching deployment"

Open resmo opened this issue 3 years ago • 5 comments

Proposal

Add an option --retry (default 3) for monitoring the job run status.

Use-cases

We run a reverse proxy as a cetralized entry point for our devs to the nomad API. The proxy itself runs on nomad.

If our CI re-deploys that proxy accessing the API through that proxy, even with canary and rolling update it might be the cli errors out with "Error fetching deployment".

We added a retry on error in our CI, but we wished the CLI job monitoring would just retry a couple of times by itself e.g. 3 times with a delay of a second.

Attempted Solutions

canary and rolling update

resmo avatar Feb 13 '22 16:02 resmo

Hey @resmo

This would be a great feature to have so thanks for making this, we'll address this internally and get back to you with our thoughts 👍. Would you be willing to raise a PR for this?

Amier3 avatar Feb 15 '22 15:02 Amier3

Thanks @Amier3

I am afraid, I don't have much experience in golang.

resmo avatar Feb 15 '22 16:02 resmo

Hi all! @DerekStrickland do you have any news about that?

bubejur avatar Sep 08 '22 06:09 bubejur

@DerekStrickland @tgross Hi! Any news?

bubejur avatar Oct 03 '22 08:10 bubejur

@bubejur we'll update issues when we're working on them. This isn't currently on our immediate roadmap.

tgross avatar Oct 03 '22 13:10 tgross

This would be helpful for us too! Thanks for suggesting this change.

kaspergrubbe avatar Mar 06 '23 18:03 kaspergrubbe

We upgraded our CI pipeline to use nomad cli 1.5.5 (from 1.4.4). With 1.4.4 we could simply call nomad job run <jobfile>. With 1.5.5 we always get the error "Error fetching deployment" after a couple of seconds. I guess we have to implement a retry logic, beacuse the pipeline always fails now.

So ... this feature would be much appreciated.

vkrebs-wktaa avatar May 25 '23 12:05 vkrebs-wktaa

Doubling down on @vkrebs-wktaa comment. I just upgraded from 1.3.x to 1.5.6 and now I see this error almost every time I nomad job run. Seems like a major regression.

josh-m-sharpe avatar May 25 '23 18:05 josh-m-sharpe

Hi all 👋

Just noting here that, while investigating #17320, I noticed that the deployment monitor was not outputting the actual error that happened, which makes it hard to understand what problem is happening, so I've opened #17348 to improve this.

A retry mechanism would indeed be helpful, so I'm keeping this one opened.

lgfa29 avatar May 29 '23 18:05 lgfa29

Any news about this?

Lord-Y avatar Jul 09 '23 07:07 Lord-Y

As this software no longer has an osi approved license, closing...

resmo avatar Aug 11 '23 07:08 resmo