firefox-translations-training
firefox-translations-training copied to clipboard
Tracking does not supports override a run: wandb [409] run was previously created and deleted
Publication from a Taskcluster group using the --overide-runs
agrument manages to delete the existing runs of a group, but fails creating new runs:
wandb: ERROR Error while calling W&B API: run teacher-1_dziji was previously created and deleted; try a new run name (<Response [409]>)
Note: It is the ID
that conflicts here, and not the name
as suggested by above message.
Furthermore, the client stays stuck during 90s
wandb.errors.CommError: Run initialization has timed out after 90.0 sec.
It is annoying because we cannot support identifying runs by unique ID (<name>_<group_id>
) and allow overriding a run from an existing project. Unfortunately deleting all artifacts from the project does not seem to fix that. Eventually a quick fix would be to detect such exception and retry with a postfix (name and ID would then be teacher-1_dziji_1
, teacher-1_dziji_2
…) and it should work (except the display is not ideal and may be confusing, at least consider documenting it).
I think W&B disallow overriding a run because it keep the data to allow a restore of the deleted runs during 7 days (see this issue: https://github.com/wandb/wandb/issues/6395). In the worst scenario we could clean everything (with the --overide-runs
) now, then hope reuploading in a week works. It would be interesting to contact the W&B team about this.
I suppose we never detected it since using similar name and IDs for identifying runs in the bar charts.