metaseq
metaseq copied to clipboard
Include run id in train.log
Right now, our json logging spits out something like the following:
2022-10-31 09:47:44 | INFO | train_inner | {"epoch": 2, "actv_norm": "300.911", "pos_norm": "0.36", "tok_norm": "0.878", "emb_norm": "0.002", "docsperex": "7.02", "loss": "3.814", "ppl": "14.07", "wps": "319347", "ups": "0.15", "wpb": "2.09715e+06", "bsz": "1024", "num_updates": "5685", "lr": "9.6531e-05", "gnorm": "0.231", "clip": "0", "train_wall": "7", "cuda_gb_allocated": "18.1", "cuda_gb_reserved": "61.6", "cuda_gb_free": "61.1", "wall": "40428"}
Proposed change:
- Add a run_id to the log (e.g.
"run_id": "8"
) that reflects what is passed in when configuring runs (default 0).
This will make stitching together runs in the future a lot easier when we do a series of rollbacks / inflight changes.