alpaca-lora
alpaca-lora copied to clipboard
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Has anyone encountered this error?
Yes. You probably have a machine with more than one gpu right?
Yes. You probably have a machine with more than one gpu right?
I run the funtune.py on a machine with A100*8, Is there a way to solve this problem?
Add:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
At the beginning of finetune.py
If you want to use all the GPUs , invoke with torchrun
For instance with 2 GPUs I'd run
OMP_NUM_THREADS=4 WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py
Make sure your global batch size is consistent with number of gpus and micro batch size, so your gradient_accumulation_steps is int. global batch size = gpus * micro batch * accumulation
@AngainorDev any way to do what you said programmatically inside the finetune.py code?
@AngainorDev I used 4 GPUs and meet the multi-GPU problem as well. Seems it's just blocked at Map
part. I do believe it's something wrong with the data splitting.
OMP_NUM_THREADS=4 WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 finetune.py --base_model '/root/models/llama_7B/' \
> --data_path './alpaca_data_cleaned.json' \
> --output_dir './lora-alpaca' \
> --batch_size 256 \
> --micro_batch_size 64 \
> --num_epochs 3 \
> --learning_rate 1e-4 \
> --cutoff_len 512 \
> --val_set_size 2000 \
> --lora_r 8 \
> --lora_alpha 16 \
> --lora_dropout 0.05 \
> --lora_target_modules '[q_proj,v_proj]' \
> --train_on_inputs \
> --group_by_length
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive SHELL=/bin/bash')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/3/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive SHELL=/bin/bash')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/1/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive SHELL=/bin/bash')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/0/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDhzbWzWvsv6oCCKJLN9iSKhCJTaiJU1JxftMX5WYlGEoW8om+ikjo/iOPQV5jo2d2f29vx5j7oJ7xWARBV2yTerPfwYHNGhxA89s1sT+o2WzkrAU1boEnKHO/U/QgYkY7gxh+Q9WYTaF3J0b64UlqD2njWV8SFOTbrSAU9ZZh9gjqykO8bVWdX0o3PxedYkAOQr5PtAUtz7vJyaPB/PN6moCSuTlenG7M5f9YOw4WbGrDJBg7Plk/Ntb+X0xygMRjj7yer3a9ynANM0HkmqRSWLTt5UdunC/ElK+FTDX9COfvbLiaw3yCx05W3vKuMu0XvQ4h1ylM6m+npLwucStpATCnBkCQfolWcni6jA3yF3kQ1TpM4JfWPhYRm1Xtk0KBdH5T+YfjFu6zQJYskiNnzMK+FCymm3UTcSB8rEp89LeSSgBdkAnyCBcScCFkJtts4AGOJ5P+BeJ1YLq3tpBq/t5naDqkbY+QTJ6GgbLQv8wuCET9GXjV/CuKddDnJHXU= bytedance@C02G463BMD6R')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('noninteractive SHELL=/bin/bash')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic__q9nvgvw/none_qtyadj12/attempt_0/2/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda116.so...
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Training Alpaca-LoRA model with params:
base_model: /root/models/llama_7B/
data_path: ./alpaca_data_cleaned.json
output_dir: ./lora-alpaca
batch_size: 256
micro_batch_size: 64
num_epochs: 3
learning_rate: 0.0001
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
resume_from_checkpoint: None
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.89s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.96s/it]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 464.18it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
Map: 0%| | 0/49942 [00:00<?, ? examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 482.55it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
Map: 0%|▋ | 125/49942 [00:00<00:40, 1234.07 examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 529.99it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.32s/it]
Map: 7%|███████████████████▊ | 3683/49942 [00:02<00:35, 1313.39 examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 310.64it/s]
Map: 8%|████████████████████▏ | 3761/49942 [00:02<00:34, 1334.61 examples/s]trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-013f535ddf518dde.arrow and /root/.cache/huggingface/datasets/json/default-af841b888df60f8e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cf4d846d5a5d2565.arrow
any way to do what you said programmatically inside the finetune.py code?
No, you have to run the .py script via torchrun instead of bare python.
Seems it's just blocked at
Map
part. I do believe it's something wrong with the data splitting.
I only tried with 2 gpus myself. I'd try with only 2, changing world_size and cuda_visible_devices just to confirm this can be it or there is a deeper issue. If this is not that, I have no clue atm. Just can tell "works for me".
If you want to use all the GPUs , invoke with torchrun
For instance with 2 GPUs I'd run
OMP_NUM_THREADS=4 WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py
Make sure your global batch size is consistent with number of gpus and micro batch size, so your gradient_accumulation_steps is int. global batch size = gpus * micro batch * accumulation
Thanks, It works for me!!!
If you want to use all the GPUs , invoke with torchrun
For instance with 2 GPUs I'd run
OMP_NUM_THREADS=4 WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py
Make sure your global batch size is consistent with number of gpus and micro batch size, so your gradient_accumulation_steps is int. global batch size = gpus * micro batch * accumulation
using 4 GPUs also is OK~
@Jeffwan same problem as you, do you have any idea to solve this?
@QinlongHuang Make sure your batch setting is correct. You can check more details here https://github.com/tloen/alpaca-lora/issues/188.
@QinlongHuang Make sure your batch setting is correct. You can check more details here #188.
Thx for your reply ! But it not works for me...
It does work on my another machine with 4xV100 16GB and CUDA 11.7 installed.
Here is my command for running the finetune.py script.
WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py \
--batch_size 128 \
--micro_batch_size 64
@QinlongHuang Make sure your batch setting is correct. You can check more details here #188.
Thx for your reply ! But it not works for me...
It does work on my another machine with 4xV100 16GB and CUDA 11.7 installed.
Here is my command for running the finetune.py script.
WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py \ --batch_size 128 \ --micro_batch_size 64
Do you get good results from using that batch_size and micro_batch_size?
@QinlongHuang
If you want to use 4 GPUs, You need to change nproc_per_node
to 4. make sure batch-size/number of GPU / micro_batch_size
is a integer. In my case, 128 / 4 / 128 = 2.
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=3192 finetune.py \
--base_model '/root/llama-7b-hf/' \
--data_path './alpaca_data_cleaned.json' \
--output_dir './lora-alpaca' \
--batch_size 1024 \
--micro_batch_size 128
@Jeffwan Thx for your patient answer, but I found it is a problem about nccl backend of 4090 card.
@lksysML With setting NCCL_P2P_DISABLE=1
, I finally solve this problem.
See #250 for more details.