YOLOv6 Multi GPU training problem

when I use multi gpu for training, I was always stuck here,please help me! Messages are as follows: training args are: Namespace(batch_size=4, check_images=False, check_labels=False, conf_file='configs/yolov6_tiny_head_det.py', data_path='data/head_det.yaml', device='4,5, 6,7', dist_url='tcp://127.0.0.1:8888', epochs=400, gpu_count=0, img_size=640, local_rank=0, name='exp', noval=False, output_dir='./runs/train', rank=0, workers=8, world_siz e=4)

Using 4 GPU for training... Initializing process group...

finally, for a long time, it does not go on

Jun 30 '22 07:06 Mobu59

Thanks for your attention. We will try our best to solve your problem, but more concrete information is necessary for reproducing your problem.The problem you mentioned has a lot to do with your hardware, so you need to provide more specific error information

Jun 30 '22 07:06 meituan-gengyifei

I try to encounter the same problem, I can execute it with a single gpu.

Jun 30 '22 07:06 Guan-LinHe

@GuanLinHu I am the same, I can train with single gpu, but when I use multi-gpu, I meet the problem above!

Jun 30 '22 07:06 Mobu59

@Mobu59 I tried to use the method that the author of yolov5 said is possible, but the method is not good.

Jun 30 '22 07:06 Guan-LinHe

@Mobu59 @GuanLinHu Sorry I haven't reproduced your problem yet, but if there is no problem with the data, it may be caused by multi-threading and deadlock： maybe you can try:OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 python train.py or --workers =0,to test the problem still exist.

Jun 30 '22 08:06 meituan-gengyifei

What's wrong with my problem, I can't use multi-GPU training, I added a NCCL_P2P_LEVEL=0 before python train.py.