metaseq icon indicating copy to clipboard operation
metaseq copied to clipboard

Include run id in train.log

Open suchenzang opened this issue 1 year ago • 0 comments

Right now, our json logging spits out something like the following:

2022-10-31 09:47:44 | INFO | train_inner | {"epoch": 2, "actv_norm": "300.911", "pos_norm": "0.36", "tok_norm": "0.878", "emb_norm": "0.002", "docsperex": "7.02", "loss": "3.814", "ppl": "14.07", "wps": "319347", "ups": "0.15", "wpb": "2.09715e+06", "bsz": "1024", "num_updates": "5685", "lr": "9.6531e-05", "gnorm": "0.231", "clip": "0", "train_wall": "7", "cuda_gb_allocated": "18.1", "cuda_gb_reserved": "61.6", "cuda_gb_free": "61.1", "wall": "40428"}

Proposed change:

  • Add a run_id to the log (e.g. "run_id": "8") that reflects what is passed in when configuring runs (default 0).

This will make stitching together runs in the future a lot easier when we do a series of rollbacks / inflight changes.

suchenzang avatar Oct 31 '22 09:10 suchenzang