[ONNX] support export onnx format
Is your feature request related to a problem? Please describe.
There is no any onnx export solution for deployment.
Solutions
Such repo is under pytorch development while some inference infrastructure prefers onnx format.
- provide onne export method.
- provide a demo for onnx inference
Additional context
No response
Found some useful resources: https://tpumlir.org/en/2023/07/10/chatglm2-6b-jie-xi-yu-tpu-bu-shu.html
Good luck
When exporting a model from PyTorch to ONNX using float16 precision, there is a significant difference in the output of the following operator.
` // Attention heads [sq, b, h] --> [sq, b, (np * 3 * hn)]
mixed_x_layer = self.query_key_value(hidden_states) ` torch Version: 2.1.0a0+b5021ba onnx Version: 1.14.0 onnxruntime-gpu Version: 1.15.1 opset=17
any updates on this issue? I still can't export the chatglm2 onnx model.