OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Bug]: PyTorch nightly produces tensor grad warning

Open maedtb opened this issue 8 months ago • 3 comments

What happened?

Running the latest PyTorch nightly produces a warning in the OneTrainer AdditionalEmbeddingWrapper.py file at the start of training. It looks like a new runtime warning in the latest version of the PyTorch nightly, that may be a bug in the current (OneTrainer) code. Someone should sanity check if the warning being produced is actually an issue or not. If this is a false-positive in PyTorch, feel free to close this issue.

What did you expect would happen?

No warning messages triggered in PyTorch.

Relevant log output

/opt/onetrainer/modules/module/AdditionalEmbeddingWrapper.py:32: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
Consider using tensor.detach() first. (Triggered internally at /pytorch/aten/src/ATen/native/Scalar.cpp:22.)
  self.orig_median_norm = torch.norm(self.orig_module.weight, dim=1).median().item()

Generate and upload debug_report.log

No response

maedtb avatar Apr 13 '25 09:04 maedtb

While I know that you don't actively support people using the nightly version, I am reporting this because it looks like it might be an undetected issue in the supported PyTorch. If it's not a real issue, great.

maedtb avatar Apr 13 '25 09:04 maedtb

It's not an issue in this case. Can you check if replacing the line with this removes the warning? self.orig_median_norm = torch.norm(self.orig_module.weight, dim=1).median().detach().item()

Nerogar avatar Apr 13 '25 10:04 Nerogar

That did make the warning go away for that line, but it also complained about the accumulated loss line in GenericTrainer. There may be other spots, but I'm not currently running into them.

I can submit this PR if you think these two "fixes" are reasonable: https://github.com/Nerogar/OneTrainer/compare/master...maedtb:OneTrainer:tensor-needs-detach-warning

maedtb avatar Apr 14 '25 01:04 maedtb

This doesn't happen with torch 2.7.1 - so apparently they have decided against this warning? Issue can be closed in that case

dxqb avatar Jun 20 '25 20:06 dxqb