transformer-deploy
transformer-deploy copied to clipboard
t5_bf16 notebooks fails with [ONNXRuntimeError] : 10 : INVALID_GRAPH
I'm running the t5_bf16 notebook with the T0_3B model. Everything works great until
enc_fp16_onnx = create_model_for_provider(encoder_model_path, "CUDAExecutionProvider", log_severity=3)
enc_fp16_onnx_binding: IOBinding = enc_fp16_onnx.io_binding()
dec_onnx = create_model_for_provider(dec_if_model_path, "CUDAExecutionProvider", log_severity=3)
dec_onnx_binding: IOBinding = dec_onnx.io_binding()
causes
InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from ./test-enc/model.onnx failed:This is an invalid model. Type Error: Type 'tensor(bfloat16)' of input parameter (onnx::Pow_398) of operator (Pow) in node (Pow_138) is invalid.
EDIT 8/1: This is odd, as onnx claims to support Pow in bf16 as of https://github.com/onnx/onnx/pull/3412. The linked PR suggests that only opset 15+ supports the exponent in Pow in bf16. I upgraded the opset version to 15 in convert_to_onnx(), and now I get a RuntimeError when calling create_model_for_provider
RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/optimizer/optimizer_execution_frame.cc:75 onnxruntime::OptimizerExecutionFrame::Info::Info(const std::vector<const onnxruntime::Node*>&, const InitializedTensorSet&, const onnxruntime::Path&, const onnxruntime::IExecutionProvider&, const std::function<bool(const std::basic_string
&)>&) [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : UnpackTensor: the pre-allocate size does not match the size in proto
I'm running PyTorch 1.11.0 and onnx 1.12.0 with onnxruntime 1.12.0. Your help would be greatly appreciated @pommedeterresautee
Hardware: NVIDIA A10 w/ 24GB and hardware bf16 support
@pommedeterresautee The t5_bf16 notebook doesn't work with t5-3b either, for that matter. It errors on the same line as T0_3B, but for a different reason:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor onnx::MatMul_2878 failed.UnpackTensor: the pre-allocate size does not match the size in proto
This is quite important for the project I'm working on, and it would be great if you could help ASAP. Thank you in advance
I will check in the coming days but TBH not sure you will like BF16 accuracy, it's quite low compared to FP16, which implies adding casting everywhere (it was our hope to not have to do that anymore). The trick is trained in BF16 models are accumulated in FP32, so at the end you need good precision to reproduce the results. Range kills FP16 and precision kills FP16 on deep nets, at the end casting is the only way. One thing which broke many stuff is Python 1.12.0 (it changed the way it stores some values in onnx), we are pushing patches here and there but did not retried those notebooks.
One thing you may want to try is exporting onnx from Pytorch with amp enabled (fp16 and bf16 are both supported). In this video at 6'30 they say that it should work in last pytorch, not had the time to try it myself. https://www.youtube.com/watch?v=R2mUT_s0PbE
If you do, would be very interested to know if it worked for you.
Also found this issue about this possibility, related bug and fixes: https://github.com/pytorch/pytorch/issues/72494
Seems to work... hope it helps in your project.
Thanks @pommedeterresautee. I couldn't find any more information on how I could use amp in the export process.
I'm actually using PyTorch 1.11 for the export (and onnx 1.12.0 and onnxruntime-gpur 1.12.0). The odd thing is that t5-small works in the t5_bf16 notebook but not t5-3b. I'd appreciate your help here.
@pommedeterresautee Part of the issue seems to be that the notebooks are generally broken with the latest version of the library and its dependencies. I've created a separate issue about that, #130.