ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

[BUG/Help] <lora微调遇到报错:RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.>

Open xlhuang132 opened this issue 1 year ago • 6 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]

使用ptuning微调是没问题的,基座模型文件都是一样的。

Expected Behavior

No response

Steps To Reproduce

环境:

  • Python: 3.9.12
  • Transformers: 4.29.2
  • PyTorch: 2.0.1+cu117

Environment

- OS: 
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

xlhuang132 avatar Jul 27 '23 08:07 xlhuang132

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]

使用ptuning微调是没问题的,基座模型文件都是一样的。

Expected Behavior

No response

Steps To Reproduce

环境:

  • Python: 3.9.12
  • Transformers: 4.29.2
  • PyTorch: 2.0.1+cu117

Environment

- OS: 
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

碰到了同样的问题,请问解决了吗?谢谢

lileilai avatar Aug 08 '23 02:08 lileilai

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s] 使用ptuning微调是没问题的,基座模型文件都是一样的。

Expected Behavior

No response

Steps To Reproduce

环境:

  • Python: 3.9.12
  • Transformers: 4.29.2
  • PyTorch: 2.0.1+cu117

Environment

- OS: 
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

碰到了同样的问题,请问解决了吗?谢谢

没有,报错是transformers包的错误,但同样的包我用ptunging微调是没问题的

xlhuang132 avatar Aug 09 '23 02:08 xlhuang132

输入时不要带attention这个参数即可,要么就转一下格式,别用bool

hscspring avatar Aug 25 '23 08:08 hscspring

加一,同样遇到了这个问题

full_attention_mask -= padding_mask.unsqueeze(-1) - 1 -》 full_attention_mask -= padding_mask.unsqueeze(-1).int() - 1 来源于(https://huggingface.co/THUDM/chatglm2-6b/discussions/67#64c0df718e261225436fc783),shibing624

YiFraternity avatar Nov 07 '23 16:11 YiFraternity

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

采用数据并行lora微调,报错如下 CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... bin /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/conda did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 110 CUDA SETUP: Loading binary /usr/local/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so... [2023-07-27 16:07:10,231] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,231] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-27 16:07:10,231] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-07-27 16:07:10,236] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-27 16:07:10,236] [INFO] [comm.py:594:init_distributed] cdb=None Start running on rank 0. Start running on rank 1. loading init model... loading init model... Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:17<00:00, 2.54s/it] {'': 0} memory_allocated 12516528640 Loading checkpoint shards: 100%|██████████████████████████████████| 7/7 [00:18<00:00, 2.63s/it] {'': 1} memory_allocated 12516528640 ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( ==========print_trainable_parameters=========== trainable params: 1949696 || all params: 6245533696 || trainable%: 0.031217444255383614 /usr/local/conda/lib/python3.9/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning warnings.warn( [2023-07-27 16:07:36,417] [WARNING] [engine.py:1115:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution ***** Rank: 0 partition count [2] and sizes[(974848, False)] Rank: 1 partition count [2] and sizes[(974848, False)] Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 226, in main() File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 218, in main trainer.train() File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/conda/lib/python3.9/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-shangou-search/huangxiaolin07/lora_jobs/train_lora_dist_chatglm2_6b.py", line 77, in compute_loss return model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1735, in forward loss = self.module(*inputs, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 932, in forward transformer_outputs = self.transformer( File "/usr/local/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/conda/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 817, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/home/hadoop-shangou-search/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 688, in get_masks full_attention_mask -= padding_mask.unsqueeze(-1) - 1 RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead. 0%| | 0/70000 [00:00<?, ?it/s]

使用ptuning微调是没问题的,基座模型文件都是一样的。

Expected Behavior

No response

Steps To Reproduce

环境:

  • Python: 3.9.12
  • Transformers: 4.29.2
  • PyTorch: 2.0.1+cu117

Environment

- OS: 
- Python: 3.9.12
- Transformers: 4.29.2
- PyTorch: 2.0.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

请问您解决这个问题的了吗,已经2024年了,感觉应该修改modeling_chatglm.py中ull_attention_mask -= padding_mask.unsqueeze(-1) - 1部分的代码,但是他是在.cache文件里,修改也不起作用,因为这个文件是每次都更新的,实在不知道该怎么办了

liuxingchuih avatar Apr 17 '24 08:04 liuxingchuih

+1,刚刚还是遇到这个问题

dongdongzhaoUP avatar Aug 21 '24 09:08 dongdongzhaoUP