Remote aim server leftover checkpoints
🐛 Bug
I have the following setup:
- On my synology NAS, I have Aim running as a docker container.
- I can reach the server via it's (local) ip address, 192.168.0.117:53800
- I can configure my PyTorch-Lighning models to use the
AimLoggerwithrepo=aim://192.168.0.117:53800 - Everything works, and runs are stored correctly.
However, the 'latest checkpoint' is always stored locally instead of the server.
Inside the repository where I start my code from, the following folder is created aim: (yes, including the colon).
You can see the results in the screenshot.
It seems to be some leftover checkpoints from aim? I'm not sure. Inspecting the output in aim shows no signs of issues. Everything seems to be in order.

To reproduce
See above
Expected behavior
I expect there is nothing logged locally, and everything is stored on the remote aim server.
Environment
- Aim Version
3.16.0running inservermode inside a docker container. Using the official aim docker image. - Python version
3.10.8 - pip version
22.2.2 - OS MacOS Montery
12.6.3
Hey @vanhumbeecka! Thanks for submitting the issue. In fact, Aim do not support storing checkpoints just yet (as there's no artifact support). On the other hand the implementation of lightning trainer has some complicated logic of selecting the save_dir. You can check it here.
@tmynn, @mahnerak I recall you had some ideas how this can be worked around? Please share your thoughts.
@vanhumbeecka
I handle this by explicitly setting up a lightning ModelCheckpoint callback. When I do this, Lightning doesn't try to interpret AimLogger's logger.save_dir as a local path.
import lightning.pytorch as pl
from lightning.pytorch.callbacks import ModelCheckpoint
callbacks = []
callbacks.append( ModelCheckpoint(dirpath='my/local/chkpts', filename='{epoch}.ckpt', monitor='val_loss', mode='min')
trainer = pl.Trainer(callbacks=callbacks, ... )