aresnow1 comments

Results 34 comments of


                                            aresnow1

FEAT: JSON mode for Llama.cpp

We've supported grammar input for ggml models in https://github.com/xorbitsai/inference/pull/525, are you interested in implementing this API?

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

确定注册的时候选的是 chat 吗，看上去是模型只有 generate 能力，没有 chat 能力。

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

这个模型的链接发一下？ @faroasis

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

roles 改成 ["Human", "Assistant"] 这个试一下 @faroasis

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

谢谢分享，如果愿意的话可以提个 PR 帮忙修复下吗？ faroasis ***@***.***>于2023年12月1日周五20:27写道： > roles 改成 ["Human", "Assistant"] 这个试一下 @faroasis > > > > 问题不在这里，应该是LlamaTokenizer词表太小，中文被切分了，如果是对整个output_ids做decode就是完整的中文字符。目前按照stream_interval输出的方式就是会把中文字符切碎。 > LlamaTokenizer(name_or_path='C:\llama2\cn_chat', vocab_size=32000, > model_max_length=1000000000000000019884624838656, is_fast=False, > padding_side='right', truncation_side='right', special_tokens={'bos_token': > AddedToken("",...

aresnow1

FEAT: JSON mode for Llama.cpp

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

QUESTIONAttributeError: [address=0.0.0.0:33767, pid=5848] Model model_format='pytorch' model_size_in_billions=13 quantizations=['4-bit', '8-bit', 'none'] model_id='FlagAlpha/Llama2-Chinese-13b-Chat' model_hub='huggingface' model_uri='file:///root/chinese-llama2' model_revision=None is not for chat.

QUESTION: How to use both gptq and vllm in qwen-14b model?

QUESTION: How to use both gptq and vllm in qwen-14b model?

QUESTION: How to use both gptq and vllm in qwen-14b model?

BUG: load model failed

BUG: load model failed