tnt icon indicating copy to clipboard operation
tnt copied to clipboard

Sanitize Metric Name in Checkpoints

Open vbourgin opened this issue 1 year ago • 1 comments

Summary:

Context

Metric names may be included in checkpoint names when specifying a best_checkpoint_config, but no verification is done on the metric name. This may lead to nested directory structures if checkpoint names contain /, e.g.:

f721785233

Here we use top1_accuracy/evaluate as the monitored_metric, which will create checkpoints in a nested directory:

{F1977112918}

Checkpointers won't be able to appropriately restore the checkpoint with the best monitored metric, as each checkpoint will be stored in a different directory.

Proposed change

In this diff, we sanitize the metric name prior to checkpoint saving, replacing / with _. Now, checkpoints are saved in the same directory:

f721793003 {F1977113027}

Differential Revision: D73004419

vbourgin avatar Apr 15 '25 02:04 vbourgin

This pull request was exported from Phabricator. Differential Revision: D73004419

facebook-github-bot avatar Apr 15 '25 02:04 facebook-github-bot