Qwen3
Qwen3 copied to clipboard
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
You have update the weights of `Qwen1.5-14B-Chat-GPTQ-Int4` and intermediate_size from 14436 to 14336 about 12 days ago. It seems that is the int4 version is not quantized directly from `Qwen1.5-14B-Chat`...
torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --data_path $DATA \
`model_path = "./model/qwen1_5-1_8b" model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_path) prompt = '苹果是什么颜色' messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content":...
Ref #269 #264, the new 32B model outputs "!!!!!!" tokens when deployed on vLLM. However, a slight tweak in the system prompt seem to address the issue. See below: ##...
input: ``` { "model": "qwen1.5-72b-chat", "temperature": 0, "maxTokens":8000, "stream": "false", "messages": [ { "role": "system", "content": "Translate everything into Simplified Chinese. Please only include the translation result." }, {"role": "user",...
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答? | Is there an...
vllm version: 0.4.0.post1 code: ```python from vllm import LLM import os os.environ["VLLM_USE_MODELSCOPE"] = "True" os.environ["CUDA_VISIBLE_DEVICES"] = "0" llm = LLM( model="qwen/Qwen1.5-32B-Chat-GPTQ-Int4", trust_remote_code=True, gpu_memory_utilization=0.6, ) output = llm.generate( "system\n" "You are...
当输入token比较短(约50以内)时,logprob变为nan,输出token变为0,表现为输出为'!!!!...',且不会停止,输出到上限为止。 长文本正常 awq正常
请问如何批量推理呢