Uncouple multiprocessing settings

Open ejm714 opened this issue 3 years ago • 1 comments

Right now, we set the multiprocessing_context for the Trainer based on the num_workers used for the data loader

https://github.com/drivendataorg/zamba/blob/master/zamba/pytorch_lightning/utils.py#L67-L71

https://github.com/drivendataorg/zamba/blob/master/zamba/models/model_manager.py#L283-L286

It would be good to separate those out for a couple reasons:

it lets us use multiple cores for data loading but not need to set a multiprocessing strategy for the trainer when only running on a single GPU
we've only trained models on a single GPU so it's not clear that multiprocessing for the model is fully and properly configured
pytorch lightning is making a lot of changes currently to their accelerators and strategies used for distributed training, so it would be nice to let those settle a bit before supporting multi GPU training in zamba

Implementation thoughts:

do not infer multiprocessing context from num workers (only use num workers for the dataloaders and to determine persistent_workers)
consider adding a multiprocessing strategy on the train config object with the PTL default. another option is to set this as a boolean and let zamba determine the best strategy / accelerator combo

Sep 28 '22 18:09 ejm714

Hey, @sambujangfofana and I are students from the University of Michigan. We are currently working on a project wherein we have to contribute to a Github repository(https://eecs481.org/hw6.html). We are pretty interested in this issue and would want to work on it. We hope to submit a pull request this week. Could we be assigned this issue?

Apr 22 '24 03:04 aaronphilip19