AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Verification failure with bert_large model

Open djramic opened this issue 1 year ago • 0 comments

During tier 1 model testing on Navi4, a verification failure was encountered with the BERT large model. The issue also occurs on MI100.

Commands to reproduce failure:

  • migraphx-driver verify --onnx bert_large_uncased_1_fp16_gpu.onnx --fill1 input_ids --input-dim @input_ids 1 384 --batch 1

FAILED: bert_large_uncased_1_fp16_gpu.onnx RMS Error: 0.0457421 Max diff: 4.33398 Mismatch at 0: -0.819824 != -0.64209

FAILED: bert_large_uncased_1_fp16_gpu.onnx RMS Error: 0.120325 Max diff: 0.649536 Mismatch at 0: 0.865723 != 0.704102

  • migraphx-driver verify --onnx bert_large_mlperf.onnx --fp16 --fill1 input_ids --fill1 segment_ids --input-dim @input_ids 1 384

FAILED: bert_large_mlperf.onnx RMS Error: -nan Max diff: nan Mismatch at 0: -5.3692 != -nan Non finite number found in target at 0: -nan

FAILED: bert_large_mlperf.onnx RMS Error: -nan Max diff: nan Mismatch at 0: -4.31443 != -nan Non finite number found in target at 0: -nan

djramic avatar Aug 19 '24 09:08 djramic