onnx2torch
onnx2torch copied to clipboard
running float16 model on the CPU
Thank you for this library, really great tool!
I have a mixed-precision ONNX model, which is based on some OnnxCast
nodes here and there. This works fine with GPU inference, but when trying to run it on the CPU there are various issues. Specifically, I am getting the following:
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
This makes sense, some of the layers are using half precision and those are not implemented for CPU runtime. So the next logical step is to cast the model parameters to float32 by using model.float()
. However, this yields the following error now:
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
I didn't dig too deep into this, but my hypothesis is that the OnnxCast
nodes are still doing conversions to float16 even though all the model parameters are float32 now. I tried just modifying the FX graph and converting the OnnxCast
nodes to noops, but I haven't been able to make that work yet. Maybe adding an argument to onnx2torch.convert
would help with this scenario, for example something like cast=False
which is cast=True
by default. Just a thought.