Chinese-LLaMA-Alpaca 萌新求问，只有llama.cpp才能量化吗？

如题，我想用hf transformers来推理，但是看到手动模型合并与转换这里所言的步骤只有：

转换原版llama
合并lora

并没有量化步骤。请问量化在哪里做呢？另外有一点不懂，就是llama和alpaca这两个是一样的吗...，意思就是Chinese llama、Chinese alpaca都可以和原版llama合并lora权重？

May 03 '23 13:05 AaronZLT

1）转成HF格式或者PTH格式之后实际就和原版LLaMA没什么区别了，至于是要不要量化取决于你要接入什么程序。HF推理并不需要你量化模型。 2）LLaMA和Alpaca区别：https://github.com/ymcui/Chinese-LLaMA-Alpaca#我应该选什么模型

May 04 '23 01:05 ymcui

量化主要还是降低内存/显存需求，用transformer应该用gptq去量化，参考 https://github.com/qwopqwop200/GPTQ-for-LLaMa

May 04 '23 06:05 bash99

好的，谢谢 @ymcui 和 @bash99 ！😊我也通过合并等等步骤，用llama.cpp和hf都成功部署了~ 但是目前有个问题，alpaca 7b的hf推理，几乎没有上下文记忆，这个也是需要调参吗

**有关量化，我先试一下gptq在inference上的表现。另外发现num_beams只要不是1就很容易出现这个问题，不过这个倒是无关紧要...

Traceback (most recent call last): File "/home/#####/Chinese-LLaMA-Alpaca/scripts/inference_hf.py", line 104, in generation_output = model.generate( ^^^^^^^^^^^^^^^ File "/home/#####/anaconda3/envs/llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/#####/anaconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1562, in generate return self.beam_sample( ^^^^^^^^^^^^^^^^^ File "/home/#####/anaconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 3187, in beam_sample next_tokens = torch.multinomial(probs, num_samples=2 * num_beams) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: probability tensor contains either inf, nan or element < 0

May 04 '23 14:05 AaronZLT

如果用inference_hf.py推理，那么“几乎没有上下文记忆”正常，因为inference_hf.py本身的目的只是为了让大家快速体验使用，其中每次提问和回答都是独立的，并没有实现多轮对话能力的相关逻辑。

建议使用llama.cpp体验多轮对话。

May 04 '23 14:05 airaria

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

May 12 '23 00:05 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

May 15 '23 22:05 github-actions[bot]