Sophia Issue on grad

Issue on grad

Open phusroyal opened this issue 1 year ago • 3 comments

File "/home/phu/Desktop/gatedtabtransformer/sophia_custom.py", line 46, in step hessian_estimate = self.hutchinson(p, grad) File "/home/phu/Desktop/gatedtabtransformer/sophia_custom.py", line 61, in hutchinson hessian_vector_product = torch.autograd.grad(grad.dot(u), p, retain_graph=True)[0] File "/home/phu/miniconda3/envs/ner-py38-conda-env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 303, in grad return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I also tried to use torch.sum(grad * u) but it did not work!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

May 25 '23 08:05 phusroyal

Just upgraded with the original implementation upgrade with pip and try again!

May 25 '23 16:05 kyegomez

Experiencing the same issue using current main version git+https://github.com/kyegomez/Sophia.git@a4db3506fffdab3a06cd4dd07ff54fb311450980 with DecoupledSophia

May 28 '23 01:05 dhbrojas

I meet the same question in Megatron when training distributed model...

May 30 '23 02:05 Kingsleyandher

Sophia Sophia copied to clipboard

Issue on grad

Upvote & Fund

Sophia
Sophia copied to clipboard