coot-videotext
coot-videotext copied to clipboard
[BUG] multi gpu training without --single_gpu
Describe the bug Problem with multi gpu training when i remove --single gpu
Expected behavior
it detects the available gpus

Screenshots

System Info:
- OS: [e.g. Ubuntu 18.04]
- Python version [e.g. 3.8.6]
- PyTorch version [e.g. 1.7.0+cu11]
Additional context Add any other context about the problem here.
If you have solved it, please consider posting your fix for others.
Did you solve this problem?
Does it still happen? If yes please post a complete bug report: Which command do you input, the complete error message, output of system command "nvidia-smi", which system / python / pytorch version. Then I will look into it.
command :

message:

output of system command "nvidia-smi":

System Info: OS: Ubuntu 18.04 Python version 3.8.5 PyTorch version 1.8.1
I change some code in utils_torch.py:
1.
before:
after:

But the model still uses only one GPU device:0.
I will check this problem, it should be possible to train on multiple GPUs. Other than that, unless you increase the model size or batch size, a single 12GB GPU is more than enough to train retrieval