YOLOX Multiple GPUs training on very large scale dataset

Hello, right now I am training yolox using my own dataset which is quite large (nearly 2M+ samples). When I run training on a single GPU, the training works normally. But when I try to run it on multiple GPUs on one machine, there is a problem: When I set num_workers into 4, 8 or 16, the training process will be stuck for a very long time and finally out of memory. And I found that in the data prefetch part: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_prefetcher.py#L17 This line takes a very very long time when num_works is not 0. I guess the iterator causes the problem. It seems that each worker will create a full copy of the whole dataset. And that's why it is stuck and out of memory. I tried to use a small dataset to do the multiple GPU training, and I found that there is no problem occurred. I also tried to set the num_workers to 0, the training is able to work but the training time will be even much longer than the time of using a single GPU, which is not acceptable for me. Would you mind to provide any suggestions or ideas to get avoid using iterator in the code of the prefetch part?

Jun 23 '22 06:06 Haroldzha

i change to use mmdetection version but meet another question- lots of GPU memory. so sad bro

Sep 21 '22 06:09 TengfeiHou

Hello, right now I am training yolox using my own dataset which is quite large (nearly 2M+ samples). When I run training on a single GPU, the training works normally. But when I try to run it on multiple GPUs on one machine, there is a problem: When I set num_workers into 4, 8 or 16, the training process will be stuck for a very long time and finally out of memory. And I found that in the data prefetch part:

https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_prefetcher.py#L17

This line takes a very very long time when num_works is not 0. I guess the iterator causes the problem. It seems that each worker will create a full copy of the whole dataset. And that's why it is stuck and out of memory. I tried to use a small dataset to do the multiple GPU training, and I found that there is no problem occurred. I also tried to set the num_workers to 0, the training is able to work but the training time will be even much longer than the time of using a single GPU, which is not acceptable for me. Would you mind to provide any suggestions or ideas to get avoid using iterator in the code of the prefetch part?

Hello, I've just met the same problem with you, have you fixed the problem?

Oct 17 '22 13:10 Mobu59

@Haroldzha Hi, is there any method to aviod this, I'm meeting the same situation now. What if do not using prefetcher, I'm not sure pytorch dataloader will do prefetch automatically or not.

Jul 18 '23 08:07 wcyjerry

Hello, thanks for mentioning that the reason is number of workers, at least the training doesn't stuck now with multiple GPU which I have no other choice, since I get NAN with low batch size for my big dataset. But also it would be helpful if anybody knows a faster solution?

Mar 26 '24 22:03 YCAyca

YOLOX YOLOX copied to clipboard

Multiple GPUs training on very large scale dataset

YOLOX
YOLOX copied to clipboard