very wip metric logger improvements
Main changes: log on every step, accumulate metrics correctly over iterations, scrap log_memory_stats_every_n_steps and consolidate with existing log_every_n_steps.
Still need to test I didn't break anything. If we like this approach I can integrate into other recipes as well
PS: our wandb logger test was never running and is broken
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/831
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit d66919e3f46d15dada45cbf4548ce12c9bf07cb5 with merge base a46560ea428939a8f5d91c7b49f189ff1787da28 ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Hey, can I help you here? Looks similiar to what I was working: https://github.com/pytorch/torchtune/pull/730
Hey, can I help you here? Looks similiar to what I was working: #730
@tcapelle thanks, yeah actually this started from trying to get the gradient accumulation test on #730 to pass and kinda expanded from there. If it's easiest for you, I am happy to just let you commandeer this PR so you don't have to go adding the changes to all the other recipes. Let me know what you'd prefer.
great work!