alpaca-lora RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Has anyone encountered this error？

Mar 27 '23 12:03 dsh54054

Yes. You probably have a machine with more than one gpu right?

Mar 27 '23 13:03 ndvbd

Yes. You probably have a machine with more than one gpu right?

I run the funtune.py on a machine with A100*8, Is there a way to solve this problem?

Mar 27 '23 13:03 dsh54054

Add: os.environ["CUDA_VISIBLE_DEVICES"] = "1" At the beginning of finetune.py

Mar 27 '23 13:03 ndvbd

If you want to use all the GPUs , invoke with torchrun

For instance with 2 GPUs I'd run OMP_NUM_THREADS=4 WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py

Make sure your global batch size is consistent with number of gpus and micro batch size, so your gradient_accumulation_steps is int. global batch size = gpus * micro batch * accumulation

Mar 27 '23 14:03 AngainorDev

@AngainorDev any way to do what you said programmatically inside the finetune.py code?

Mar 27 '23 16:03 ndvbd

@AngainorDev I used 4 GPUs and meet the multi-GPU problem as well. Seems it's just blocked at Map part. I do believe it's something wrong with the data splitting.

 OMP_NUM_THREADS=4 WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234   finetune.py  --base_model '/root/models/llama_7B/' \
>     --data_path './alpaca_data_cleaned.json' \
>     --output_dir './lora-alpaca' \
>     --batch_size 256 \
>     --micro_batch_size 64 \
>     --num_epochs 3 \
>     --learning_rate 1e-4 \
>     --cutoff_len 512 \
>     --val_set_size 2000 \
>     --lora_r 8 \
>     --lora_alpha 16 \
>     --lora_dropout 0.05 \
>     --lora_target_modules '[q_proj,v_proj]' \
>     --train_on_inputs \
>     --group_by_length

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive    SHELL=/bin/bash')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/3/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive    SHELL=/bin/bash')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/1/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive    SHELL=/bin/bash')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/0/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive    SHELL=/bin/bash')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/2/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.89s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.96s/it]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 464.18it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
Map:   0%|                                                                                                                                                                                                                                                                                         | 0/49942 [00:00<?, ? examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 482.55it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
Map:   0%|▋                                                                                                                                                                                                                                                                            | 125/49942 [00:00<00:40, 1234.07 examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 529.99it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.32s/it]
Map:   7%|███████████████████▊                                                                                                                                                                                                                                                        | 3683/49942 [00:02<00:35, 1313.39 examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 310.64it/s]
Map:   8%|████████████████████▏                                                                                                                                                                                                                                                       | 3761/49942 [00:02<00:34, 1334.61 examples/s]trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow

Mar 27 '23 17:03 Jeffwan

any way to do what you said programmatically inside the finetune.py code?

No, you have to run the .py script via torchrun instead of bare python.

Seems it's just blocked at Map part. I do believe it's something wrong with the data splitting.

I only tried with 2 gpus myself. I'd try with only 2, changing world_size and cuda_visible_devices just to confirm this can be it or there is a deeper issue. If this is not that, I have no clue atm. Just can tell "works for me".

Mar 27 '23 19:03 AngainorDev

If you want to use all the GPUs , invoke with torchrun

For instance with 2 GPUs I'd run OMP_NUM_THREADS=4 WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py

Make sure your global batch size is consistent with number of gpus and micro batch size, so your gradient_accumulation_steps is int. global batch size = gpus * micro batch * accumulation

Thanks, It works for me!!!

Mar 28 '23 02:03 dsh54054

If you want to use all the GPUs , invoke with torchrun

For instance with 2 GPUs I'd run OMP_NUM_THREADS=4 WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py

Make sure your global batch size is consistent with number of gpus and micro batch size, so your gradient_accumulation_steps is int. global batch size = gpus * micro batch * accumulation

using 4 GPUs also is OK~

Mar 29 '23 05:03 yayeoCddy

@Jeffwan same problem as you, do you have any idea to solve this?

Apr 02 '23 07:04 QinlongHuang

@QinlongHuang Make sure your batch setting is correct. You can check more details here https://github.com/tloen/alpaca-lora/issues/188.

Apr 03 '23 04:04 Jeffwan

@QinlongHuang Make sure your batch setting is correct. You can check more details here #188.

Thx for your reply ! But it not works for me...

It does work on my another machine with 4xV100 16GB and CUDA 11.7 installed.

Here is my command for running the finetune.py script.

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py \                                                                                                                      
--batch_size 128 \
--micro_batch_size 64

Apr 03 '23 08:04 QinlongHuang

@QinlongHuang Make sure your batch setting is correct. You can check more details here #188.

Thx for your reply ! But it not works for me...

It does work on my another machine with 4xV100 16GB and CUDA 11.7 installed.

Here is my command for running the finetune.py script.
WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py \                                                                                                                      
--batch_size 128 \
--micro_batch_size 64

Do you get good results from using that batch_size and micro_batch_size?

Apr 03 '23 09:04 lksysML

@QinlongHuang

If you want to use 4 GPUs, You need to change nproc_per_node to 4. make sure batch-size/number of GPU / micro_batch_size is a integer. In my case, 128 / 4 / 128 = 2.

WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=3192 finetune.py \
    --base_model '/root/llama-7b-hf/' \
    --data_path './alpaca_data_cleaned.json' \
    --output_dir './lora-alpaca' \
    --batch_size 1024 \
    --micro_batch_size 128

Apr 03 '23 17:04 Jeffwan

@Jeffwan Thx for your patient answer, but I found it is a problem about nccl backend of 4090 card.

@lksysML With setting NCCL_P2P_DISABLE=1, I finally solve this problem.

See #250 for more details.

Apr 04 '23 07:04 QinlongHuang

alpaca-lora alpaca-lora copied to clipboard

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

alpaca-lora
alpaca-lora copied to clipboard