Sophia
Sophia copied to clipboard
Issue on grad
File "/home/phu/Desktop/gatedtabtransformer/sophia_custom.py", line 46, in step hessian_estimate = self.hutchinson(p, grad) File "/home/phu/Desktop/gatedtabtransformer/sophia_custom.py", line 61, in hutchinson hessian_vector_product = torch.autograd.grad(grad.dot(u), p, retain_graph=True)[0] File "/home/phu/miniconda3/envs/ner-py38-conda-env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 303, in grad return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I also tried to use torch.sum(grad * u)
but it did not work!
Upvote & Fund
- We're using Polar.sh so you can upvote and help fund this issue.
- We receive the funding once the issue is completed & confirmed by you.
- Thank you in advance for helping prioritize & fund our backlog.
Just upgraded with the original implementation upgrade with pip and try again!
Experiencing the same issue using current main version git+https://github.com/kyegomez/Sophia.git@a4db3506fffdab3a06cd4dd07ff54fb311450980
with DecoupledSophia
I meet the same question in Megatron when training distributed model...