Yiming Cui
Yiming Cui
https://github.com/ymcui/Chinese-LLaMA-Alpaca/issues/315
It is not solved yet. You can actively follow this PR, where the issue is being investigated: https://github.com/ggerganov/llama.cpp/pull/1826
> LlamaChat v2 is coming with expanded support for ggml and other models. development has stalled for a bit but hopefully I'll be able to get back to it soon...
见FAQ回答:https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/faq_zh
感谢关注,预计很快将与大家分享模型。
原版模型能用GPU,我们的模型也能。
加载的是GGUF版吗? Instruction template里的内容: ``` {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '' + message['role'] + ' '+ message['content'] | trim + '' %}{%...
我们没有更改instruction template,与Meta-Llama-3-8B-Instruct是一致的。 加载Meta-Llama-3-8B-Instruct确实会出现无限生成的问题。这个只能等待下游这些软件适配了。 目前测试正常的有:原生transformers, llama.cpp, lm studio。其余的或多或少都有点问题。
1)llamacpp才改过pre-tokenizer,其他下游程序(如ollama)不一定能很快适配;2)modelfile可能要更新; 我建议是再等等下游适配;另外就是可以直接用源头的llamacpp推理。
刚刚试了一下原版Meta-Llama-3-8B-Instruct也是类似的问题,等下游适配吧。 llama.cpp里没有此类问题。