nomad icon indicating copy to clipboard operation
nomad copied to clipboard

feature request: print "pending" in the ui in case of placement failures

Open Kamilcuk opened this issue 1 year ago • 6 comments
trafficstars

Proposal

I would want to request for Nomad web UI to show "pending" or "Placement failure" or something similar when there are placement failures in a jobs.

Printing "failed" is confusing. The job is not failed. It is just waiting for resources. It is not failed. And it will run. All is fine. Moreover, the failed is in red color.

image

Use-cases

The use case is any time there are more jobs scheduled then resources.

Attempted Solutions

No solutions attempted.

Thanks!

Kamilcuk avatar Oct 18 '24 12:10 Kamilcuk

Running batch job workloads one would except that the resources of the cluster are at some point satisfied and some jobs are queued to be executed later. So, marking them as failed is IMHO confusing.

waldemarmeier avatar Oct 19 '24 08:10 waldemarmeier

Thanks for raising this issue, seems completely valid.

When setting statuses up in the UI, I intended to have placement failures covered by the "Deploying" status. However, batch/sysbatch jobs do not have deployments, so we show them as failed. The closest current convention is probably Recovering, so I may use that instead of pending, but will definitely look to make a change here soon.

philrenaud avatar Oct 23 '24 14:10 philrenaud

Hi, i noticed also the status "recovering" also shows up and is also confusing. "Recovering" shows while the job is downloading docker image, so the is job is starting but not yet healthy. It's no "recovering", recovering from... ? The job is just starting for the first time, it is "starting" or "running". My 2 cents, I find this status is also confusing.

I would only propose to show just job.Status field like it was in previous versions. If i have to learn that "recovering" means starting, i can do that, that is fine for me too.

Would it be possible to add mouse hovers with explanations of the current status like in the linked documentation? I find such mouse hovers usually super usefull. Thank you.

Kamilcuk avatar Oct 23 '24 14:10 Kamilcuk

Hover tooltip/info is a great suggestion, thank you.

And yes, "recovering" makes a lot of sense for "This job isn't deploying, but a client went down and now it's trying to come back up" situations. Unfortunately for the UI, "batch/sysbatch job that is just starting up" looks a lot like this, too.

I'll have to think of something else here, as you're right that it's a thorn. I know we've toyed with the idea of "deployments" for batch/sysbatch jobs, but we shouldn't make the UI dependent on that. I'll consider alternatives.

philrenaud avatar Oct 23 '24 14:10 philrenaud

Thanks for raising this issue, seems completely valid.

When setting statuses up in the UI, I intended to have placement failures covered by the "Deploying" status. However, batch/sysbatch jobs do not have deployments, so we show them as failed. The closest current convention is probably Recovering, so I may use that instead of pending, but will definitely look to make a change here soon.

In this case "Queued" might be an option, as well.

waldemarmeier avatar Oct 23 '24 17:10 waldemarmeier

agreeing on this one, confusing that UI is showing as error, but Nomad's exported metrics (nomad_nomad_job_summary_queued) show as Queued .

(UI is a bit 'special' with its status I noticed anyway, even if starting new Job, it seems to transition from Error to Recovering and then Running, which feels rather 'odd' ? )

dmclf avatar Oct 30 '24 01:10 dmclf