DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

New integration - CometMonitor

Open alexkuzmik opened this issue 1 year ago • 2 comments

This PR introduces a new monitoring option - CometMonitor which comes up as an official integration with CometML.

The new monitor is covered with unit tests.

Notes:

  • We've updated docs/code-docs/source/monitor.rst but it doesn't look used anymore
  • We've updated the "Monitoring Module" section name in config-json.md to be generic so the next integration won't require updating it.

alexkuzmik avatar Apr 25 '24 19:04 alexkuzmik

@microsoft-github-policy-service agree company="Comet"

alexkuzmik avatar Apr 25 '24 19:04 alexkuzmik

Hi, @loadams! I hope I fixed the issues in my unit tests (they revealed themselves only in multi-gpu env). Could you please re-run the tests?

alexkuzmik avatar May 08 '24 14:05 alexkuzmik

Hi @loadams @alexkuzmik , @deepcharm confirmed a bug in this PR when determining the config monitor.enabled flag, and fixed it in new PR https://github.com/microsoft/DeepSpeed/pull/5633

nelyahu avatar Jun 09 '24 20:06 nelyahu