ChatGLM2-6B
ChatGLM2-6B copied to clipboard
[BUG/Help] <lora微调遇到报错:RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.>
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
采用数据并行lora微调,报错如下
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 110
CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so...
bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so
/usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 110
CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so...
[2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None
Start running on rank 0.
Start running on rank 1.
loading init model...
loading init model...
Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it]
{'': 0}
memory_allocated 12516528640
Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it]
{'': 1}
memory_allocated 12516528640
==========print_trainable_parameters===========
trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614
/usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
warnings.warn(
==========print_trainable_parameters===========
trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614
/usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
warnings.warn(
[2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
Rank: 0 partition count [2] and sizes[(974848, False)]
Rank: 1 partition count [2] and sizes[(974848, False)]
Traceback (most recent call last):
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in -
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~
or logical_not()
operator instead.
0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in -
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~
or logical_not()
operator instead.
0%| | 0/70000 [00:00<?, ?it/s]
使用ptuning微调是没问题的,基座模型文件都是一样的。
Expected Behavior
No response
Steps To Reproduce
环境:
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
Environment
- OS:
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
Is there an existing issue for this?
- [x] I have searched the existing issues
Current Behavior
采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set
no_deprecation_warning=True
to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or setno_deprecation_warning=True
to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the-
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the~
orlogical_not()
operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the-
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the~
orlogical_not()
operator instead. 0%| | 0/70000 [00:00<?, ?it/s]使用ptuning微调是没问题的,基座模型文件都是一样的。
Expected Behavior
No response
Steps To Reproduce
环境:
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
Environment
- OS: - Python: 3.9.12 - Transformers: 4.29.2 - PyTorch: 2.0.1+cu117 - CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
碰到了同样的问题,请问解决了吗?谢谢
Is there an existing issue for this?
- [x] I have searched the existing issues
Current Behavior
采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set
no_deprecation_warning=True
to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or setno_deprecation_warning=True
to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the-
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the~
orlogical_not()
operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the-
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the~
orlogical_not()
operator instead. 0%| | 0/70000 [00:00<?, ?it/s] 使用ptuning微调是没问题的,基座模型文件都是一样的。Expected Behavior
No response
Steps To Reproduce
环境:
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
Environment
- OS: - Python: 3.9.12 - Transformers: 4.29.2 - PyTorch: 2.0.1+cu117 - CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
碰到了同样的问题,请问解决了吗?谢谢
没有,报错是transformers包的错误,但同样的包我用ptunging微调是没问题的
输入时不要带attention这个参数即可,要么就转一下格式,别用bool
加一,同样遇到了这个问题
full_attention_mask -= padding_mask.unsqueeze(-1) - 1 -》 full_attention_mask -= padding_mask.unsqueeze(-1).int() - 1
来源于(https://huggingface.co/THUDM/chatglm2-6b/discussions/67#64c0df718e261225436fc783),shibing624
Is there an existing issue for this?
- [x] I have searched the existing issues
Current Behavior
采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set
no_deprecation_warning=True
to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or setno_deprecation_warning=True
to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the-
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the~
orlogical_not()
operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the-
operator, with a bool tensor is not supported. If you are trying to invert a mask, use the~
orlogical_not()
operator instead. 0%| | 0/70000 [00:00<?, ?it/s]使用ptuning微调是没问题的,基座模型文件都是一样的。
Expected Behavior
No response
Steps To Reproduce
环境:
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
Environment
- OS: - Python: 3.9.12 - Transformers: 4.29.2 - PyTorch: 2.0.1+cu117 - CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
请问您解决这个问题的了吗,已经2024年了,感觉应该修改modeling_chatglm.py中ull_attention_mask -= padding_mask.unsqueeze(-1) - 1部分的代码,但是他是在.cache文件里,修改也不起作用,因为这个文件是每次都更新的,实在不知道该怎么办了
+1,刚刚还是遇到这个问题