moshilangzi comments

Results 17 comments of


                                            moshilangzi

申请支持支持Qwen2.5-VL-72B-Instruct-AWQ模型，VLLM推理

> 指针是什么意思？不知道这不是正常的AWQ格式量化的相应模型，没有加到默认支持里。秦总，这是一家公司，它发布的这个awq，有木有别的可以部署的awq呢～

申请支持支持Qwen2.5-VL-72B-Instruct-AWQ模型，VLLM推理

> pointer 是什么意思？不知道这个是不是正常的 AWQ 格式量化的模型，因此没有加到默认支持里。 https://huggingface.co/Benasd/Qwen2.5-VL-72B-Instruct-AWQ 这个模型能不能给支持一下～VLLM推理

[Bug]: Qwen2.5-VL-72B-Instruct-AWQ error with TP=2 and low throughput (~2 tokens/s) on VLLM_USE_V1=1

> Thanks for putting that model on huggingface! > > I am not sure if I want to create a new issue for it, but we got a different bug...

Add Model PaddleOCR-VL

Qwen2.5-VL-72B什么时候出GPTQ和AWQ的量化版本啊。

> 等待正式发布，我在HuggingFace上找到了这个模型。看起来效果很好。 (https://huggingface.co/PointerHQ/Qwen2.5-VL-72B-Instruct-Pointer-AWQ) 这个可以使用VLLM成功部署吗

Qwen2.5-VL-72B什么时候出GPTQ和AWQ的量化版本啊。

> 我用 vllm 跑的 https://huggingface.co/Benasd/Qwen2.5-VL-7B-Instruct-AWQ 没问题 https://huggingface.co/Benasd/Qwen2.5-VL-72B-Instruct-AWQ 还没试过 Benasd/Qwen2.5-VL-72B-Instruct-AWQ我的报错: (VllmWorkerProcess pid=40329) ERROR 02-18 14:47:14 multiproc_worker_utils.py:242] raise ValueError( (VllmWorkerProcess pid=40329) ERROR 02-18 14:47:14 multiproc_worker_utils.py:242] ValueError: The input size is not aligned...

Xinference support jina-reranker-m0

[Feature]: Application support for the Qwen2.5-VL-72B-Instruct-unsloth-bnb-4bit、Qwen2.5-VL-72B-Instruct-bnb-4bit series models.

> I am running [unsloth/Qwen2.5-VL-72B-Instruct-bnb-4bit](https://modelscope.cn/models/unsloth/Qwen2.5-VL-72B-Instruct-bnb-4bit) succesfully, note that the dynamic quant one is not yet working, at least not for me.. > > You might want to [build vLLM from...

[Feature]: Application support for the Qwen2.5-VL-72B-Instruct-unsloth-bnb-4bit、Qwen2.5-VL-72B-Instruct-bnb-4bit series models.

[Feature]: Application support for the InternVL2.5-78B series models.

> This should has been supported, you can try serving the AWQ model with this command: > > vllm serve OpenGVLab/InternVL2_5-78B-AWQ --quantization awq --dtype half --max-model-len 4096 Thank you, How...