bhack

Results 1417 comments of bhack

There was a PR at https://github.com/tensorflow/models/pull/10367

https://github.com/tensorflow/models/blob/master/official/projects/yolo/ops/kmeans_anchors.py https://github.com/tensorflow/models/blob/master/official/projects/yolo/ops/kmeans_anchors_test.py

You could reproduce it with https://github.com/pytorch/pytorch/issues/118865#issuecomment-1924745100 Using 8/16 GPU params. I've tested on a 94 CPU instance

Are you able to reproduce it? I suppose yes as the instructions are giving a reproducible container/image sequence.

Can you try to re-run same with pytorch 2.0.x? As this number of processes emerged with pytorch docker image upgrades without changing what there is in the repo

It is not a problem you can run on a smaller machine as I see the same also on 8 gpu.

It reproducible also with less GPU and less dataloader workers. E.g. I've tested the same with 4 dataloader workers. > Maybe there are some other processes being spawned by the...

I am testing this on every pytorch nightly container and the number of spawned if very high. `6` data workers with `16 gpus` DDP or DP I have 600+ proceses...

@tringwald Also in you case how you are going form 8 gpu 16 dataloaders to have ~400 Python processes? It seems that @ejguan is no more the Dataloader codeowner right?...

/cc @fmassa @soumith Can we update the owner?