MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

[BUG] MiniCPM-Llama3-V-2_5-int4微调出现:RuntimeError: only Tensors of floating point dtype can require gradients

Open KeepFaithMe opened this issue 1 year ago • 1 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • [X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

测试环境: torch==2.1.2
torchvision== 0.16.0 显卡为:4060Ti 16G显存 finetune_lora.sh文件如下:

在用MiniCPM-Llama3-V-2_5-int4进行微调测试时出现如下错误。 [2024-09-07 13:00:42,421] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible [2024-09-07 13:00:43,534] [INFO] [comm.py:637:init_distributed] cdb=None [2024-09-07 13:00:43,534] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl WARNING:root:FSDP or ZeRO3 are not incompatible with QLoRA. Unused kwargs: ['load_in_4bit', 'load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. low_cpu_mem_usage was None, now set to True since model is quantized. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.22s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Currently using LoRA for fine-tuning the MiniCPM-V model. Traceback (most recent call last): File "/work/MiniCPM-V/finetune/finetune.py", line 299, in train() File "/work/MiniCPM-V/finetune/finetune.py", line 243, in train model = get_peft_model(model, lora_config) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/mapping.py", line 179, in get_peft_model return PeftModel(model, peft_config, adapter_name=adapter_name, autocast_adapter_dtype=autocast_adapter_dtype) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/peft_model.py", line 155, in init self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 139, in init super().init(model, config, adapter_name) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 175, in init self.inject_adapter(self.model, adapter_name) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 417, in inject_adapter new_module = ModulesToSaveWrapper(target, adapter_name) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/utils/other.py", line 195, in init self.update(adapter_name) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/utils/other.py", line 245, in update self.modules_to_save[adapter_name].requires_grad(True) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2440, in requires_grad p.requires_grad_(requires_grad) RuntimeError: only Tensors of floating point dtype can require gradients [2024-09-07 13:00:50,735] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2187) of binary: /root/miniconda3/envs/minicpm/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/minicpm/bin/torchrun", line 8, in sys.exit(main()) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main run(args) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetune.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2024-09-07_13:00:50 host : 555898d76c84 rank : 0 (local_rank: 0) exitcode : 1 (pid: 2187) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

finetune_lora.sh文件内容如下: GPUS_PER_NODE=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001

MODEL="/work/MiniCPM-V/check_point/OpenBMB/MiniCPM-Llama3-V-2_5-int4" # or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

DATA="/work/MiniCPM-V/minicpm_data/data/train.json" EVAL_DATA="/work/MiniCPM-V/minicpm_data/eval/eval.json" LLM_TYPE="llama3"

if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm

#if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3

MODEL_MAX_Length=2048 # if conduct multi-images sft, please set MODEL_MAX_Length=4096 export NCCL_P2P_DISABLE=1 export NCCL_IB_DISABLE=1 DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE
--nnodes $NNODES
--node_rank $NODE_RANK
--master_addr $MASTER_ADDR
--master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py
--model_name_or_path $MODEL
--llm_type $LLM_TYPE
--data_path $DATA
--eval_data_path $EVAL_DATA
--remove_unused_columns false
--label_names "labels"
--prediction_loss_only false
--bf16 false
--bf16_full_eval false
--fp16 true
--fp16_full_eval true
--do_train
--do_eval
--tune_llm false
--use_lora true
--q_lora true
--tune_vision true
--lora_target_modules "llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj)"
--model_max_length $MODEL_MAX_Length
--max_slice_nums 9
--max_steps 10000
--eval_steps 1000
--output_dir output/output__lora
--logging_dir output/output_lora
--logging_strategy "steps"
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--evaluation_strategy "steps"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 10
--learning_rate 1e-6
--weight_decay 0.1
--adam_beta2 0.95
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 1
--gradient_checkpointing true
--deepspeed ds_config_zero3.json
--report_to "tensorboard" # wandb

我不确定这是不是由于显存不够引起的

期望行为 | Expected Behavior

解决上述问题

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

KeepFaithMe avatar Sep 07 '24 13:09 KeepFaithMe

你好,这个是你在qlora训练时开启了lora层以外的权重,因此需要做的就是llm_tune==false,vision_tune==false.

LDLINGLINGLING avatar Sep 09 '24 01:09 LDLINGLINGLING

非常感谢您的回复!根据您的建议,我将tune_vision 和tune_llm设置为false,但是仍然得到如下错误。 [2024-09-09 12:51:12,210] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible [2024-09-09 12:51:13,429] [INFO] [comm.py:637:init_distributed] cdb=None [2024-09-09 12:51:13,429] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl WARNING:root:FSDP or ZeRO3 are not incompatible with QLoRA. Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. low_cpu_mem_usage was None, now set to True since model is quantized. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.99s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Currently using LoRA for fine-tuning the MiniCPM-V model. {'Total': 5528713456, 'Trainable': 668901376} llm_type=llama3 Loading data... max_steps is given, it will override any value given in num_train_epochs Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.04164481163024902 seconds Parameter Offload: Total persistent parameters: 747760 in 354 params 0%| | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last): File "/work/MiniCPM-V/finetune/finetune.py", line 299, in train() File "/work/MiniCPM-V/finetune/finetune.py", line 289, in train trainer.train() File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/work/MiniCPM-V/finetune/trainer.py", line 199, in training_step loss = self.compute_loss(model, inputs) File "/work/MiniCPM-V/finetune/trainer.py", line 23, in compute_loss outputs = self.model.base_model(data = inputs, use_cache=False) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl result = forward_call(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward return self.model.forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5-int4/modeling_minicpmv.py", line 164, in forward vllm_embedding, vision_hidden_states = self.get_vllm_embedding(data) File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5-int4/modeling_minicpmv.py", line 100, in get_vllm_embedding vision_embedding = self.resampler(vision_embedding, tgt_sizes) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl result = forward_call(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/peft/utils/other.py", line 264, in forward return self.modules_to_save[self.active_adapter](*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl result = forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5-int4/resampler.py", line 150, in forward out = self.attn( File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl result = forward_call(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1241, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/nn/functional.py", line 5413, in multi_head_attention_forward attn_output = linear(attn_output, out_proj_weight, out_proj_bias) RuntimeError: mat2 must be a matrix, got 1-D tensor 0%| | 0/10000 [00:00<?, ?it/s] [2024-09-09 12:51:25,461] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 349) of binary: /root/miniconda3/envs/minicpm/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/minicpm/bin/torchrun", line 8, in sys.exit(main()) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main run(args) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/minicpm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetune.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2024-09-09_12:51:25 host : 555898d76c84 rank : 0 (local_rank: 0) exitcode : 1 (pid: 349) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

这是我的数据链接: 通过百度网盘分享的文件:minicpm_data 链接:https://pan.baidu.com/s/1ou3NKsslCPlBxtvdtcYX5A 提取码:khlu 数据说明:这里面只有两张图片,是用来测试能否进行微调的。上传的目的是为了排除数据集的问题。从现在的问题看,数据应该没问题【但确实无法确定】。

KeepFaithMe avatar Sep 09 '24 13:09 KeepFaithMe

你好 这个是因为pytorch的multiheadattention,也就是resampler的 attn模块不适配deepspeed zero3的训练,你可以尝试使用zero2+offload的方式来训练

qyc-98 avatar Sep 25 '24 13:09 qyc-98