ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

[Help] lora微调合并后模型推理速度明显慢了好多

Open daydayup-zyn opened this issue 1 year ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

只用lora微调了20条数据集,合并模型后,推理速度相比原来的基座模型慢了好多,这个是什么原因?跟微调过程中的参数有关系吗?

Expected Behavior

No response

Steps To Reproduce

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

daydayup-zyn avatar Oct 13 '23 02:10 daydayup-zyn

不知道您是否解决该问题。我们在使用lora处理该模型时遇到了同样的问题...具体的问题是大约前一半bs推理非常慢,后面一半又非常快。同时观察到模型的输出大部分为空...

ExtremelyDarkSun avatar Jan 31 '24 08:01 ExtremelyDarkSun

请问是否解决该问题了,怎么解决的?

fywu avatar Aug 09 '24 06:08 fywu