JaonLiu
JaonLiu
> No, I didn't encounter that error. Can you give me more context? just use : ``` instructions = [ "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价", ] ```
Qwen1.5-0.5B-Chat-GPTQ-Int4、Qwen1.5-0.5B-Chat-GPTQ-Int8、Qwen1.5-0.5B-Chat-GPTQ-Int8 这3个模型用vllm部署后,请求都报错了。只有Qwen1.5-0.5B-Chat-AWQ和Qwen1.5-0.5B-Chat可以正常返回。 报错内容都是: ``` ransformers/tokenization_utils_fast.py", line 612, in convert_tokens_to_string return self.backend_tokenizer.decoder.decode(tokens) TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString' ```
@JianxinMa 求助
same problem,+1008611
same question
> > same OOM question > > Set placement_policy='cpu' can alleviate this question. How long will it cost to run train_sft.py only use CPU?
Heal our children!
> we run our LLama 7B 4 * A100 80G, if you want to run it on 40 G A100, you can use a smaller batch size and expand accimulation_steps...
@yyoon Could you help to solve it? Thanks a lot!
都没看到 load_msra_ner_without_dev 这个函数在哪里定义的