bhack
bhack
There was a PR at https://github.com/tensorflow/models/pull/10367
https://github.com/tensorflow/models/blob/master/official/projects/yolo/ops/kmeans_anchors.py https://github.com/tensorflow/models/blob/master/official/projects/yolo/ops/kmeans_anchors_test.py
You could reproduce it with https://github.com/pytorch/pytorch/issues/118865#issuecomment-1924745100 Using 8/16 GPU params. I've tested on a 94 CPU instance
Are you able to reproduce it? I suppose yes as the instructions are giving a reproducible container/image sequence.
Can you try to re-run same with pytorch 2.0.x? As this number of processes emerged with pytorch docker image upgrades without changing what there is in the repo
It is not a problem you can run on a smaller machine as I see the same also on 8 gpu.
It reproducible also with less GPU and less dataloader workers. E.g. I've tested the same with 4 dataloader workers. > Maybe there are some other processes being spawned by the...
I am testing this on every pytorch nightly container and the number of spawned if very high. `6` data workers with `16 gpus` DDP or DP I have 600+ proceses...
@tringwald Also in you case how you are going form 8 gpu 16 dataloaders to have ~400 Python processes? It seems that @ejguan is no more the Dataloader codeowner right?...
/cc @fmassa @soumith Can we update the owner?