Baichuan-13B
Baichuan-13B copied to clipboard
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I used peft
to fine tune baichuan llm via lora way.
I ran the same fine-tuning code as 7B for 13B, but something went wrong:
/opt/conda/envs/trl/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/opt/conda/envs/trl/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[21], line 1
----> 1 trainer.train()
2 model.save_pretrained("baichuan13b/baichuan13b/")
File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:1537, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1532 self.model_wrapped = self.model
1534 inner_training_loop = find_executable_batch_size(
1535 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
1536 )
-> 1537 return inner_training_loop(
1538 args=args,
1539 resume_from_checkpoint=resume_from_checkpoint,
1540 trial=trial,
1541 ignore_keys_for_eval=ignore_keys_for_eval,
1542 )
File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:1802, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1799 self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
1801 with self.accelerator.accumulate(model):
-> 1802 tr_loss_step = self.training_step(model, inputs)
1804 if (
1805 args.logging_nan_inf_filter
1806 and not is_torch_tpu_available()
1807 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1808 ):
1809 # if loss is nan or inf simply add the average of previous logged losses
1810 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:2658, in Trainer.training_step(self, model, inputs)
2656 scaled_loss.backward()
2657 else:
-> 2658 self.accelerator.backward(loss)
2660 return loss.detach() / self.args.gradient_accumulation_steps
File /opt/conda/envs/trl/lib/python3.10/site-packages/accelerate/accelerator.py:1842, in Accelerator.backward(self, loss, **kwargs)
1840 return
1841 elif self.scaler is not None:
-> 1842 self.scaler.scale(loss).backward(**kwargs)
1843 else:
1844 loss.backward(**kwargs)
File /opt/conda/envs/trl/lib/python3.10/site-packages/torch/_tensor.py:487, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
477 if has_torch_function_unary(self):
478 return handle_torch_function(
479 Tensor.backward,
480 (self,),
(...)
485 inputs=inputs,
486 )
--> 487 torch.autograd.backward(
488 self, gradient, retain_graph, create_graph, inputs=inputs
489 )
File /opt/conda/envs/trl/lib/python3.10/site-packages/torch/autograd/__init__.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
195 retain_graph = create_graph
197 # The reason we repeat same the comment below is that
198 # some Python versions print out the first line of a multi-line function
199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
201 tensors, grad_tensors_, retain_graph, create_graph, inputs,
202 allow_unreachable=True, accumulate_grad=True)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
could you please help me to fix this, thx