aim icon indicating copy to clipboard operation
aim copied to clipboard

Pytorch Lightning run is marked as finished after .fit loop

Open Michael-Tanzer opened this issue 1 year ago • 5 comments

🐛 Bug

When using the pytorch lightning aim logger, a run will be marked as finished after the fit loop, ignoring the test loop and any metric logged there.

Expected behavior

The logger should mark the run as finished only on exit, after testing loop and any other additional logging.

Environment

  • Aim Version: 3.19.2
  • Python version: 3.10.8
  • Lightning version: 2.0.1
  • pip version: 22.3.1
  • OS: Linux

Michael-Tanzer avatar Apr 12 '24 16:04 Michael-Tanzer

Hey @Michael-Tanzer! Thanks for the report. The run is being closed, because pytorch lightning is calling .finalize() method on the logger. But when the test loop starts, and the trainer logs any additional metrics during the test loop aim.Run will be reopened. I've tested it on our example(https://github.com/aimhubio/aim/blob/main/examples/pytorch_lightning_track.py) and the test loss is successfully tracked. Can you please double-check if the test metrics are tracked?

mihran113 avatar Apr 16 '24 14:04 mihran113

Hi, I'm glad it's working on this example, but there is also another ticket with pretty much the same issue. Could it be related to the fact that I am using a remote server? My current fix is to disable finalize and later finalize the run manually.

Michael-Tanzer avatar Apr 16 '24 14:04 Michael-Tanzer

#3097

Michael-Tanzer avatar Apr 16 '24 14:04 Michael-Tanzer

Oh, yeah, remote tracking is actually causing this. I've just opened a PR which should address that: https://github.com/aimhubio/aim/pull/3134 We'll release a patch version today or tomorrow which will include the fix for this issue.

mihran113 avatar Apr 16 '24 15:04 mihran113

Thank you! This is awesome news! I will close this issue then

Michael-Tanzer avatar Apr 16 '24 15:04 Michael-Tanzer