clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

[Bug | clearml-agent] `clearml-agent daemon --stop` crashes when 'default' queue has been removed

Open talajasi7 opened this issue 3 years ago • 2 comments

If the 'default' queue is accidentally removed from the queue list, the clearml-agent daemon --stop command crashes when trying to shut down the active agents in the case no further options are included after --stop command. Specifically, I got the following error:

clearml_agent: ERROR: APIError: code 400/707: No queue is tagged as the default queue for this company

Also I noticed the problem persists after creating another 'default' queue.

talajasi7 avatar Oct 07 '21 17:10 talajasi7

Thanks @talajasi7 I was able to verify the issue. The main reason is not specifying --queue which causes the agent to look for a queue with the default tag, since there isn't one, we get an error. --stop should not actually look for the default queue at all... Maybe as a followup, if --queue is not specified it should look for either a queue with tag default or name default wdyt?

bmartinn avatar Oct 07 '21 20:10 bmartinn

If internally the active agents are registered in a collection, I think it would be a good idea to eliminate them in a FIFO or LIFO manner in case of not specifying a queue (no need to look for a 'default' queue). I would also add an option to shut down all active agents (such as clearml-agent daemon --stop-all)

talajasi7 avatar Oct 08 '21 08:10 talajasi7

Closing this as it was already released. Please reopen if required.

jkhenning avatar Mar 15 '23 13:03 jkhenning