Error with @ex.main and if __name__ == '__main__':
When I use @ex.main and if __name__ == '__main__':, MongoObserver collect no data.
There is minimal code to reproduce my error:
from sacred import Experiment
from sacred.observers import MongoObserver
ex = Experiment('OBB_Swin')
ex.observers.append(MongoObserver(url='localhost:27017', db_name='OBB'))
@ex.main
def my_main():
print('test')
if __name__ == '__main__':
# ex.run_commandline() # correct
# ex.run() # correct
my_main()
Looking forward to your reply!
There are some reasons I can't use ex.run_commandline() and ex.run(). For ex.run_commandline(), It can't work with an existing argparse. And for ex.run(), it can't work with multiple GPU training (for example: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py)
Hi @HanGuangXin! Happy new year! Unfortunately, you have to use ex.run (or ex.run_commandline) for everything to work. ex.run contains the code to set up the configuration and observers. @ex.main doesn't modify my_main, it just registers it as the default main function for ex.run.
For the multi-GPU training: what exactly is not working and do you know why?
+1
Multi-GPU is used more and more frequently nowadays but does not work with sacred. Because the there are additional stuff in the command line to start python, just like what @HanGuangXin mentioned: python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/train.py
+1 Making scared work alongside torch multiprocessing is an absolute pain.