ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

[ONNX] support export onnx format

Open alanzhai219 opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe.

There is no any onnx export solution for deployment.

Solutions

Such repo is under pytorch development while some inference infrastructure prefers onnx format.

  • provide onne export method.
  • provide a demo for onnx inference

Additional context

No response

alanzhai219 avatar Jul 04 '23 11:07 alanzhai219

Found some useful resources: https://tpumlir.org/en/2023/07/10/chatglm2-6b-jie-xi-yu-tpu-bu-shu.html

Good luck

NewJerseyStyle avatar Aug 03 '23 10:08 NewJerseyStyle

When exporting a model from PyTorch to ONNX using float16 precision, there is a significant difference in the output of the following operator.

` // Attention heads [sq, b, h] --> [sq, b, (np * 3 * hn)]

mixed_x_layer = self.query_key_value(hidden_states) ` torch Version: 2.1.0a0+b5021ba onnx Version: 1.14.0 onnxruntime-gpu Version: 1.15.1 opset=17

XiaokunDing avatar Aug 21 '23 12:08 XiaokunDing

any updates on this issue? I still can't export the chatglm2 onnx model.

manishghop avatar Dec 11 '23 20:12 manishghop