clearml icon indicating copy to clipboard operation
clearml copied to clipboard

scalars sometimes not reported - only with sleep, flush

Open gorogm opened this issue 1 year ago • 3 comments

Thank you for helping us making ClearML better!

Describe the bug

Some scalars won't appear on https://app.clear.ml/projects after a training for some reason. When running trainings after each other, for some trainings these scalars appear, sometimes not. These scalars summarize the results, so loosing them makes the whole training useless. I could solve it by introducing some time.sleep(5) and task.flush() calls in my code. I've read the similar https://github.com/allegroai/clearml/issues/446 issue, and while the technical reason is explained there, I think it's still a bug.

To reproduce

I couldn't come up with a short code that reproduces is >50% of cases - sorry. I ran a training in vscode/jupyter, generating some plots, and the scalars generated after the plot are most often missing from the dashboard later. If between the plot and scalar reporting I inject a 'time.sleep(5)', and I finish the code by time.sleep(5) task.flush() task.close() then the problem goes away.

Expected behaviour

Scalars should be uploaded without any sleep and flush statements.

Environment

  • Server type (self hosted \ app.clear.ml) app.clear.ml
  • ClearML SDK Version - 1.6.4
  • ClearML Server Version (Only for self hosted). --
  • Python Version - 3.8
  • OS (Windows \ Linux \ Macos) - Linux

Related Discussion

gorogm avatar Sep 05 '22 09:09 gorogm

Hi @gorogm ,

Usually clearml pushes metric reports periodically, when new reports come in, and on task shutdown (to clear up any pending reports). I assume that in your case we're either talking about remaining reports left since the task did not close automatically (as this is jupyter/vscode and the script does not terminate). In this case, it is indeed recommended calling task.close() to singla the end of training and let clearml wrap things up...

jkhenning avatar Sep 05 '22 13:09 jkhenning

Hi! Thanks for your answer! The thing is, that task.close() itself is not enough :( A single task.close() will leave me without the scalars uploaded.

gorogm avatar Sep 05 '22 14:09 gorogm

This is something we'll look into as closing the task should invoke a flush. I'll ask someone to take a look.

jkhenning avatar Sep 05 '22 14:09 jkhenning