Sanitize Metric Name in Checkpoints
Summary:
Context
Metric names may be included in checkpoint names when specifying a best_checkpoint_config, but no verification is done on the metric name. This may lead to nested directory structures if checkpoint names contain /, e.g.:
f721785233
Here we use top1_accuracy/evaluate as the monitored_metric, which will create checkpoints in a nested directory:
{F1977112918}
Checkpointers won't be able to appropriately restore the checkpoint with the best monitored metric, as each checkpoint will be stored in a different directory.
Proposed change
In this diff, we sanitize the metric name prior to checkpoint saving, replacing / with _. Now, checkpoints are saved in the same directory:
f721793003 {F1977113027}
Differential Revision: D73004419
This pull request was exported from Phabricator. Differential Revision: D73004419