mmdetection
mmdetection copied to clipboard
Many CPU cores are unused
Hello, I have encountered the same problem as https://github.com/open-mmlab/mmdetection/issues/10761.
I am launching the following script:
./mmdetection/tools/dist_train.sh ./mmdetection/configs/mask_rcnn/mask-rcnn_r50_fpn_1x_coco.py 4
Conda env summary:
- python=3.8.19=h955ad1f_0
- numpy==1.23.5
- opencv-python==4.9.0.80
- pytorch=1.13.1=py3.8_cuda11.7_cudnn8.5.0_0
- pytorch-cuda=11.7=h778d358_5
- mmcv==2.1.0
- mmengine==0.10.4
- mmpretrain==1.2.0
- mmdet: '3.3.0'
Train batch size: 20
Hardware setup: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 6 CPU max MHz: 3600,0000 CPU min MHz: 800,0000 BogoMIPS: 6000.00
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 1,1 MiB (24 instances)
L1i: 768 KiB (24 instances)
L2: 30 MiB (24 instances)
L3: 36 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Vulnerabilities:
Gather data sampling: Mitigation; Microcode
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affected
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
4 GPU NVIDIA RTX 6000 Ada Generation 49140MiB Driver Version: 535.104.05 CUDA Driver Version: 12.2
The less workers I use, the faster training goes and GPU utilization is more stable.
With many workers:
With only 2 workers:
Using NVIDIA Nsight Systems profiler I see that many CPUs are just not utilized.
I have conducted the same experiment on another hardware setup and increasing number of workers also increase the train speed.
Could you give any advice? Shall I update any drivers?