Baichuan-13B RuntimeError: element 0 of tensors does not require grad and does not have a grad

I used peft to fine tune baichuan llm via lora way. I ran the same fine-tuning code as 7B for 13B, but something went wrong:

/opt/conda/envs/trl/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/opt/conda/envs/trl/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[21], line 1
----> 1 trainer.train()
      2 model.save_pretrained("baichuan13b/baichuan13b/")

File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:1537, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1532     self.model_wrapped = self.model
   1534 inner_training_loop = find_executable_batch_size(
   1535     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1536 )
-> 1537 return inner_training_loop(
   1538     args=args,
   1539     resume_from_checkpoint=resume_from_checkpoint,
   1540     trial=trial,
   1541     ignore_keys_for_eval=ignore_keys_for_eval,
   1542 )

File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:1802, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1799     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   1801 with self.accelerator.accumulate(model):
-> 1802     tr_loss_step = self.training_step(model, inputs)
   1804 if (
   1805     args.logging_nan_inf_filter
   1806     and not is_torch_tpu_available()
   1807     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1808 ):
   1809     # if loss is nan or inf simply add the average of previous logged losses
   1810     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:2658, in Trainer.training_step(self, model, inputs)
   2656         scaled_loss.backward()
   2657 else:
-> 2658     self.accelerator.backward(loss)
   2660 return loss.detach() / self.args.gradient_accumulation_steps

File /opt/conda/envs/trl/lib/python3.10/site-packages/accelerate/accelerator.py:1842, in Accelerator.backward(self, loss, **kwargs)
   1840     return
   1841 elif self.scaler is not None:
-> 1842     self.scaler.scale(loss).backward(**kwargs)
   1843 else:
   1844     loss.backward(**kwargs)

File /opt/conda/envs/trl/lib/python3.10/site-packages/torch/_tensor.py:487, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    477 if has_torch_function_unary(self):
    478     return handle_torch_function(
    479         Tensor.backward,
    480         (self,),
   (...)
    485         inputs=inputs,
    486     )
--> 487 torch.autograd.backward(
    488     self, gradient, retain_graph, create_graph, inputs=inputs
    489 )

File /opt/conda/envs/trl/lib/python3.10/site-packages/torch/autograd/__init__.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     retain_graph = create_graph
    197 # The reason we repeat same the comment below is that
    198 # some Python versions print out the first line of a multi-line function
    199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202     allow_unreachable=True, accumulate_grad=True)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

could you please help me to fix this, thx

Jul 11 '23 09:07 vpegasus

Can you provide Code snippets to reproduce ?

Jul 11 '23 09:07 mmmans

Can you provide Code snippets to reproduce ?

OK， here is the code snippets:

model_id ='baichuan-inc/Baichuan-13B-Base'
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_8bit=True,device_map='auto', trust_remote_code=True)
peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM,target_modules=["W_pack", "o_proj", "gate_proj", "up_proj", "down_proj"], inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
...
trainer.train()

In, addition:

peft                          0.4.0.dev0
accelerate                    0.21.0.dev0
bitsandbytes                  0.39.0
transformers                  4.31.0.dev0
transformers-stream-generator 0.0.4

Jul 11 '23 10:07 vpegasus

I also encountered this issue, if there are any progress, please let me know, thx.

Jul 12 '23 13:07 EricLee8

@vpegasus

I also meet this error. you could add model.enable_input_require_grads() and try it

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_8bit=True,device_map='auto', trust_remote_code=True)
model.enable_input_require_grads()

Jul 13 '23 04:07 supinyu

I try it and I got this error

/workspace/work/LMFlow-main-baichuan/examples/finetune.py:66 in

63                                                                                              
64                                                                                              
65 if __name__ == '__main__':

❱ 66 main()
67

/workspace/work/LMFlow-main-baichuan/examples/finetune.py:59 in main

56     print("model", model_args)                                                               
57     print("data", data_args)                                                                 
58     print("pipe", pipeline_args)

❱ 59 model = AutoModel.get_model(model_args)
60
61 # Finetuning
62 tuned_model = finetuner.tune(model=model, dataset=dataset)

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/lmflow-0.0.1-py3.9.egg/lmflow/models/
auto_model.py:16 in get_model

13     def get_model(self, model_args, *args, **kwargs):                                        
14         arch_type = model_args.arch_type                                                     
15         if arch_type == "decoder_only":

❱ 16 return HFDecoderModel(model_args, *args, **kwargs)
17 elif arch_type == "text_regression":
18 return TextRegressionModel(model_args, *args, **kwargs)
19 elif arch_type == "encoder_decoder":

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/lmflow-0.0.1-py3.9.egg/lmflow/models/
hf_decoder_model.py:233 in init

230             #print("debug", embedding_size, len(tokenizer))                                 
231             if len(tokenizer) > embedding_size:                                             
232                 model.resize_token_embeddings(len(tokenizer))

❱ 233 model.enable_input_require_grads()
234 self.config = config
235 self.backend_model = model
236 self.tune_strategy = tune_strategy

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/transformers/modeling_utils.py:1174
in enable_input_require_grads

1171         def make_inputs_require_grads(module, input, output):                              
1172             output.requires_grad_(True)                                                    
1173

❱ 1174 self._require_grads_hook = self.get_input_embeddings().register_forward_hook(mak
1175
1176 def disable_input_require_grads(self):
1177 """

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/transformers/modeling_utils.py:1192
in get_input_embeddings

1189         base_model = getattr(self, self.base_model_prefix, self)                           
1190         print("debug", base_model, self.base_model_prefix, self)                           
1191         if base_model is not self:

❱ 1192 return base_model.get_input_embeddings()
1193 else:
1194 raise NotImplementedError
1195

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/transformers/modeling_utils.py:1194
in get_input_embeddings

1191         if base_model is not self:                                                         
1192             return base_model.get_input_embeddings()                                       
1193         else:

❱ 1194 raise NotImplementedError
1195
1196 def set_input_embeddings(self, value: nn.Module):
1197 """
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ NotImplementedError

Jul 13 '23 12:07 delltower

we've made update today. Pull the newest version of code from Hugging Face. And try our tested third party fine-tuning tool in README if you're interested.

Jul 18 '23 12:07 GradientGuru

I encounter the same error. updating the latest modeling_baichuan.py still not work. Then I add model.enable_input_require_grads() , it works. Besides, my lora_target_modules only has 'o_proj' , maybe you can try less target_modules.

Jul 19 '23 15:07 PromptExpert

Still have the problem +1

Jul 27 '23 03:07 zyong812

Baichuan-13B Baichuan-13B copied to clipboard

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Baichuan-13B
Baichuan-13B copied to clipboard