onnxruntime
onnxruntime copied to clipboard
torch.nn.LayerNorm mismatches in nightly.
Describe the bug torch.nn.LayerNorm mismatches in nightly, but matches in 1.12.1
Urgency None.
System information Nightly torch Nightly onnxruntime
To Reproduce
import io
import torch
from onnxruntime import InferenceSession, SessionOptions
model = torch.nn.LayerNorm([10, 10])
x = torch.randn(20, 5, 10, 10)
torch_out = model(x)
model_onnx = io.BytesIO()
torch.onnx.export(
model.eval(),
x,
model_onnx,
opset_version=14
)
sess = InferenceSession(model_onnx.getvalue(), SessionOptions(), providers=['CUDAExecutionProvider'])
ort_out = sess.run(None, {"input":x.numpy()})
torch.testing.assert_close([torch_out.detach().numpy()], ort_out, rtol=1e-3, atol=1e-7)
Mismatched elements: 9974 / 10000 (99.7%)
Greatest absolute difference: 1.6739110946655273 at index (5, 4, 2, 0) (up to 1e-07 allowed)
Greatest relative difference: 14660.064374461228 at index (12, 1, 8, 7) (up to 0.001 allowed)
@wangyems @tianleiwu for comments
It does not reproduce in my machine. I used latest nightly version on python 3.8 and Ubuntu 18.04:
PyTorch: 1.13.0.dev20220830+cu113 from pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu113
ort-nightly-gpu: 1.13.0.dev20220830001
Thanks for taking a look. I think I might mess up the CUDA version between these two. I will check again this week.
I keep the onnx, and use it to run with onnxruntime==1.12.1 and nightly onnxruntime (built with CUDA 11.6) layer_norm.zip
The result doesn't algin.
This is how I built it:
./build.sh --config RelWithDebInfo --enable_training --use_cuda --cuda_home /usr/local/cuda/ --cudnn_home /usr/local/cuda/ --build_wheel --parallel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=70 --cuda_version=11.6
Noted that opset 17 is aligned, as it's using the newly supported LayerNorm, but the opset 9-16 are all mismatched (they are composed of ReduceMean/Div nodes).
cc @justinchuby if you have more insight to provide.
Possible caused by --enable_training. In my test, I did not enable training.
Indeed, without --enable_training, the issue is solved. Is the mismatch with --enable_training is an expected behavior? Why is there a needed difference?
@titaiwangms is this problem resolved? I don't think the mismatch is expected when enable_training is turned on. Are you still seeing this mismatch?