accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Feature request: Specifying GPU IDs

Open tmabraham opened this issue 2 years ago • 1 comments

It would be ideal to specify the GPU IDs that a script can use. i typically like to have a GPU ID argument in my script so that I can set the device, but if Accelerate is handling the devices I can't specify. The current alternative is to use CUDA_VISIBLE_DEVICES but a dedicated argument in accelerate config or in the Accelerator object would be ideal.

tmabraham avatar Jul 21 '22 03:07 tmabraham

@muellerzr I want to work on it

tanmoyio avatar Jul 26 '22 13:07 tanmoyio

how i can change GPU 0 to other gpu? @muellerzr i am using Accelerate, error is below given, please give the solution in details /home/suresh/myenv/lib/python3.8/site-packages/accelerate/accelerator.py:391: UserWarning: log_with=tensorboard was passed but no supported trackers are currently installed. warnings.warn(f"log_with={log_with} was passed but no supported trackers are currently installed.")

{'clip_sample_range', 'timestep_spacing', 'trained_betas', 'sample_max_value', 'variance_type', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values. {'num_attention_heads', 'attention_type', 'addition_time_embed_dim', 'dropout', 'reverse_transformer_layers_per_block', 'transformer_layers_per_block'} was not found in config. Values will be initialized to default values. accelerator.device = cuda accelerator.device = cuda Traceback (most recent call last): File "/home/suresh/lora/diffusers/examples/kandinsky2_2_train/tune_decoder_lora.py", line 589, in main() File "/home/suresh/lora/diffusers/examples/kandinsky2_2_train/tune_decoder_lora.py", line 362, in main image_encoder.to(accelerator.device) File "/home/suresh/myenv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2576, in to return super().to(*args, **kwargs) File "/home/suresh/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/home/suresh/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/suresh/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/suresh/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/home/suresh/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/home/suresh/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 10.75 GiB of which 17.75 MiB is free. Process 2304721 has 9.66 GiB memory in use. Including non-PyTorch memory, this process has 1.05 GiB memory in use. Of the allocated memory 854.74 MiB is allocated by PyTorch, and 65.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

skfirebox avatar Apr 15 '24 22:04 skfirebox