onnx2torch running float16 model on the CPU

running float16 model on the CPU

Open sniklaus opened this issue 1 year ago • 0 comments

Thank you for this library, really great tool!

I have a mixed-precision ONNX model, which is based on some OnnxCast nodes here and there. This works fine with GPU inference, but when trying to run it on the CPU there are various issues. Specifically, I am getting the following:

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

This makes sense, some of the layers are using half precision and those are not implemented for CPU runtime. So the next logical step is to cast the model parameters to float32 by using model.float(). However, this yields the following error now:

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

I didn't dig too deep into this, but my hypothesis is that the OnnxCast nodes are still doing conversions to float16 even though all the model parameters are float32 now. I tried just modifying the FX graph and converting the OnnxCast nodes to noops, but I haven't been able to make that work yet. Maybe adding an argument to onnx2torch.convert would help with this scenario, for example something like cast=False which is cast=True by default. Just a thought.

Feb 01 '24 21:02 sniklaus

onnx2torch onnx2torch copied to clipboard

running float16 model on the CPU

onnx2torch
onnx2torch copied to clipboard