bisenet-tensorflow
bisenet-tensorflow copied to clipboard
How to start multi-gpu training to accelerate the training process?
I start training with train.py.but the process is lengthy since there is only one GPU involved in the training process.SO how to start the multi-gpu training .I have 8 gpus in my computer..Thank you.
Check here https://github.com/pdoublerainbow/bisenet-tensorflow/blob/master/train.py#L94, If you only want to make use of all free gpu, you should comment the line. But are you sure all gpu are only free to you? If not , you can change the method auto_select_gpu() a little(here: https://github.com/pdoublerainbow/bisenet-tensorflow/blob/master/utils/misc_utils.py#L32) to make use of three largest GPU.
Check here https://github.com/pdoublerainbow/bisenet-tensorflow/blob/master/train.py#L94, If you only want to make use of all free gpu, you should comment the line. But are you sure all gpu are only free to you? If not , you can change the method auto_select_gpu() a little(here: https://github.com/pdoublerainbow/bisenet-tensorflow/blob/master/utils/misc_utils.py#L32) to make use of three largest GPU.
I simply change the code #94. os.environ['CUDA_VISIBLE_DEVICES']='0,1,2,3', in order to make use of the four gpus.but it seems that training program only use no. 0 gpu not all the 4 gpus.
Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 29528 C python 8847MiB | | 1 29528 C python 209MiB | | 2 29528 C python 209MiB | | 3 29528 C python 209MiB |
You should use nvidia-smi in the terminal to check your gpu free memory. @billplzhang
You should use nvidia-smi in the terminal to check your gpu free memory. @billplzhang
before I start train.py:
NVIDIA-SMI 390.87 Driver Version: 390.87 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A | | 23% 28C P8 15W / 250W | 10MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A | | 23% 29C P8 16W / 250W | 10MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A | | 46% 78C P2 88W / 250W | 4433MiB / 11178MiB | 70% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A | | 23% 35C P8 15W / 250W | 10MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 108... Off | 00000000:85:00.0 Off | N/A | | 23% 38C P8 15W / 250W | 8876MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 GeForce GTX 108... Off | 00000000:86:00.0 Off | N/A | | 23% 31C P8 16W / 250W | 6848MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A | | 23% 26C P8 16W / 250W | 10MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 GeForce GTX 108... Off | 00000000:8A:00.0 Off | N/A | | 26% 42C P2 56W / 250W | 11056MiB / 11178MiB | 0% Default |
After I start train.py (specify 0 ,1,3 as visible).
NVIDIA-SMI 390.87 Driver Version: 390.87 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A | | 23% 38C P2 71W / 250W | 8859MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A | | 23% 29C P8 16W / 250W | 221MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A | | 43% 73C P2 88W / 250W | 4435MiB / 11178MiB | 38% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A | | 23% 35C P8 15W / 250W | 221MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 108... Off | 00000000:85:00.0 Off | N/A | | 23% 38C P2 55W / 250W | 8876MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 GeForce GTX 108... Off | 00000000:86:00.0 Off | N/A | | 23% 31C P8 16W / 250W | 6848MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A | | 23% 26C P8 16W / 250W | 10MiB / 11178MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 GeForce GTX 108... Off | 00000000:8A:00.0 Off | N/A | | 23% 41C P2 56W / 250W | 11056MiB / 11178MiB | 0% Default |
Aparrently, GPU 1,3 is not used in the trainning process.So how to utilize the three gpu together.
You just need to modify the batch size in the configuration.py.
You just need to modify the batch size in the configuration.py.
How to modify?a larger batch size ?
Yes, according to your free memory size. @billplzhang