onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

torch.nn.LayerNorm mismatches in nightly.

Open titaiwangms opened this issue 3 years ago • 7 comments

Describe the bug torch.nn.LayerNorm mismatches in nightly, but matches in 1.12.1

Urgency None.

System information Nightly torch Nightly onnxruntime

To Reproduce

import io
import torch
from onnxruntime import InferenceSession, SessionOptions

model = torch.nn.LayerNorm([10, 10])
x = torch.randn(20, 5, 10, 10)
torch_out = model(x)

model_onnx = io.BytesIO()
torch.onnx.export(
    model.eval(),
    x,
    model_onnx,
    opset_version=14
)
sess = InferenceSession(model_onnx.getvalue(), SessionOptions(), providers=['CUDAExecutionProvider'])
ort_out = sess.run(None, {"input":x.numpy()})

torch.testing.assert_close([torch_out.detach().numpy()], ort_out, rtol=1e-3, atol=1e-7)
Mismatched elements: 9974 / 10000 (99.7%)
Greatest absolute difference: 1.6739110946655273 at index (5, 4, 2, 0) (up to 1e-07 allowed)
Greatest relative difference: 14660.064374461228 at index (12, 1, 8, 7) (up to 0.001 allowed)

titaiwangms avatar Aug 28 '22 21:08 titaiwangms

@wangyems @tianleiwu for comments

hariharans29 avatar Aug 29 '22 18:08 hariharans29

It does not reproduce in my machine. I used latest nightly version on python 3.8 and Ubuntu 18.04: PyTorch: 1.13.0.dev20220830+cu113 from pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu113 ort-nightly-gpu: 1.13.0.dev20220830001

tianleiwu avatar Aug 31 '22 03:08 tianleiwu

Thanks for taking a look. I think I might mess up the CUDA version between these two. I will check again this week.

titaiwangms avatar Aug 31 '22 15:08 titaiwangms

I keep the onnx, and use it to run with onnxruntime==1.12.1 and nightly onnxruntime (built with CUDA 11.6) layer_norm.zip

The result doesn't algin.

This is how I built it:

./build.sh --config RelWithDebInfo --enable_training --use_cuda --cuda_home /usr/local/cuda/ --cudnn_home /usr/local/cuda/ --build_wheel --parallel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=70 --cuda_version=11.6

Noted that opset 17 is aligned, as it's using the newly supported LayerNorm, but the opset 9-16 are all mismatched (they are composed of ReduceMean/Div nodes).

titaiwangms avatar Sep 06 '22 22:09 titaiwangms

cc @justinchuby if you have more insight to provide.

titaiwangms avatar Sep 06 '22 22:09 titaiwangms

Possible caused by --enable_training. In my test, I did not enable training.

tianleiwu avatar Sep 06 '22 22:09 tianleiwu

Indeed, without --enable_training, the issue is solved. Is the mismatch with --enable_training is an expected behavior? Why is there a needed difference?

titaiwangms avatar Sep 06 '22 22:09 titaiwangms

@titaiwangms is this problem resolved? I don't think the mismatch is expected when enable_training is turned on. Are you still seeing this mismatch?

baijumeswani avatar Dec 16 '22 22:12 baijumeswani