transformers
transformers copied to clipboard
Trainer using only one GPU instead of two
System Info
transformers version 4.26.0 python version 3.8.8 pytorch version 1.9.0+cu102
Who can help?
trainer: @sgugger, @muellerzr and @pacman100
Reproduction
I am trying to train a T5 model using two gpus but for some reason the trainer only uses one?
in my bash file i specified the number of GPUs i wanna use like this:
#SBATCH --gres=gpu:2
and in my code i added this:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
but when i check the number of GPUs in the training argument i always get 1:
print("build trainer with on device:", training_args.device, "with n gpus:", training_args.n_gpu)
the output:
build trainer with on device: cuda:0 with n gpus: 1
Expected behavior
I want to use all the available GPUs
How are you launching the python script in your bash?
this is the content of my bash script:
#!/bin/sh
# Options SBATCH :
#SBATCH --job-name=DSI_gpu # Job name
#SBATCH --mail-type=END # Email notification
#SBATCH [email protected]
#SBATCH --ntasks=1 # Number of paralel jobs
#SBATCH --cpus-per-task=4
#SBATCH --partition=GPUNodes # partition
#SBATCH --gres=gpu:2
#SBATCH --gres-flags=enforce-binding
# Traitement
module purge
module load singularity/3.0.3
srun singularity exec /logiciels/containerCollections/CUDA11/pytorch-NGC-21-03-py3.sif $HOME/dr_env/bin/python3.8 "path/to/python/script.py"
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
You're launching with python. You should use either accelerate launch or torch.distributed.run otherwise you'll get model parallel (which isn't what you're aiming for)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.