AMD LayerNorm Seg Fault in PyTorch
❓ The question
Hi, i'm from the PyTorch team and I'm recently aware that we need some customization in layer norm, because it'll seg fault without bias: https://github.com/allenai/OLMo/blob/cf121084409d844e4f540b7d08b8f37bbe1eec98/olmo/model.py#L203. I wonder if this is already resolved? I'm trying this repro on the current pytorch and it seems to run just fine:
import torch
assert torch.version.hip is not None
input = torch.randn(10, 10, 10).cuda()
ln = torch.nn.LayerNorm([10, 10], bias=False).cuda()
ln(input).sum().backward()
print(ln.weight.grad)
assert ln.bias is None
@dirkgr may know more about this.
To be clear, you attempted to repro this on current pytorch in AMD GPUs?
As far as I know this was a ROCm-only problem, and AMD already fixed it. If it works with Torch+ROCm 5.7, I would consider this finished. In fact, we could get rid of that extra class.
We don't run models with this LN configuration anymore anyways, so it doesn't matter for us, but it was quite a difficult bug to track down at the time.