bhack
bhack
I continue to see too much spawn processes also on other DDP models/code. With `torchrun` and a recent pytorch nightly I see `361` process with just 98 dataloaders.
I'am testing again with new nightlies and we have always the same behaviour. @malfet How we could progress on this?
/cc @andrewkho @gokulavasan Can you support to investigate this behavior change?
I expect to not have more `pytorch` processes then NGPUs X dataloader `num_workers`
On older pytorch I see the number of pytorch processes correlated to dataloader `num_workers`
e.g. `pstree` output with pytorch nightly same job 8 GPUs with 5 dataloaders `num_workers` (DDP) ``` pt_main_thread-+-pt_main_thread-+-32*[pt_main_thread---{pt_main_thread}] | |-5*[pt_main_thread---67*[{pt_main_thread}]] | `-128*[{pt_main_thread}] |-7*[pt_main_thread-+-32*[pt_main_thread---{pt_main_thread}]] | |-5*[pt_main_thread---67*[{pt_main_thread}]]] | `-126*[{pt_main_thread}]] |-python `-64*[{pt_main_thread}] ```
@andrewkho As you want to keep the discussion here is it an issue related to the workers process or to the number of threads (see the other mentioned PR https://github.com/pytorch/pytorch/pull/126199)?...
If you see it was already reproduced also by @tringwald. I don't know if the regression is in dataloader or not but we have a lot of threads. How we...
The environment where you can reproduce this is at https://github.com/pytorch/pytorch/issues/118865#issuecomment-1924745100
Ok I will take it from nightly but I need to wait for another training session to really test it.