Junyang Lin comments

Results 173 comments of


                                            Junyang Lin

logits获取

scores are processed logits and I think you should directly get `output["logits"]`. Check if it works, and see https://github.com/huggingface/transformers/blob/main/src/transformers/generation/utils.py for better understanding.

部署了Qwen1.5-32B-Chat-GPTQ-Int4可以运行，但出现了CUDA extension not installed，推理速度很慢

for the installation of auto-gptq, we advise you to install from source (git clone the repo and run `pip install -e .`) or you will meet "CUDA not installed" issue.

Qwen1.5-MoE-A2.7B模型训练时如何初始化的

Stay tuned for our coming tech report. Temporarily we do not release details about this

请问使用qwen1.5-1.8，生成停不下来，官方给的预测代码，qwen1.5-1.8B-chat没有问题，那qwen1.5-1.8该如何预测呢？

Models without `-chat` in their names do not serve for chatting. In fact, the base models are usually for finetuning. It is totally trained by next token prediction on large-scale...

[BUG] Qwen1.5-14B回答不会停止，会自动进行对话，或者出现重复回答

I suspect that you are using base model instead of chat model? Use Qwen1.5-14B-Chat and follow the example code

请教一下，现在这个model.generate的方式怎么设置stop_words为“Observation:”呢？

Next week I'll provide an instruction. You can take a look at `model.chat()` in our previous Qwen code and see if you can do it yourself.

vllm 72b启动失败

> 多卡可能要加这个参数 --tensor-parallel-size，我用了没报oom的错了，但是有其他cuda错误 Yeah you need this for tensor parallelism to deploy the large model on multiple devices.

Some questions about shared_expert_gate

sry temporarily we are not about to release the details. stay tuned for the coming tech report.

72b-text-v1.5-q6_K 和 72b-text-v1.5-fp16 哪一个性能更强？

it is about the quantization, q6 you can regard it as 6 bit quantization, q2 you can regard it as 2 bit quantization. for sure, fp16 / bf16 should perform...

在昇腾910A测试了qwen1.5-14B-chat推理，速度慢，结果是无意义的混乱字符

Same one. No support of it temporarily