trlx Memory occupy with multi GPUs Training

Memory occupy with multi GPUs Training

Open yuanyaaa opened this issue 1 year ago • 1 comments

When I use trlx to fine-tune Flan-T5-Large with single GPU, the memory used is about 11GB; However, when I use accelerate for parallel training, the memory used is 4*16GB! I can't understand why is it. And whether can I use about 11GB for parallel training? Is the problem caused by config? The accelerate config is:

distributed_type: MULTI_GPU 
downcast_bf16: 'no' 
gpu_ids: all 
machine_rank: 0 
main_training_function: main 
mixed_precision: 'no' 
num_machines: 1 
num_processes: 4 
rdzv_backend: static 
same_network: true 
tpu_env: [] 
tpu_use_cluster: false 
tpu_use_sudo: false 
use_cpu: false

Thank you very much for your reply!

Aug 22 '23 03:08 yuanyaaa

trlx trlx copied to clipboard

Memory occupy with multi GPUs Training

trlx
trlx copied to clipboard