hxyghostor comments

Results 5 comments of


                                            hxyghostor

VLM inference performance

seems latency 300ms and QPS 8 with A100 lmdeploy serve api_server /classification/qwen2-vl-2b-4bit-finetune --server-port $PORT0 --model-format awq --quant-policy 8 We have trained Qwen2 VL for the multi-classification task of images. Transfer...

VLM inference performance

It can be understood that the batch size I tested is 1. After deploying the API, I wrote concurrent calls to test its performance.

是否支持VLM 输出logprobs

server lmdeploy serve api_server /question_classification/qwen2-vl-2b-4bit --server-port $PORT0 --backend turbomind --model-format awq --enable-prefix-caching --quant-policy 8 client ``` import requests import base64 api_url = f"http://localhost:10516/v1/chat/completions" image_path = "" with open(image_path, "rb") as...

是否支持VLM 输出logprobs

Is there a time estimate for when tuibomind will support Qwen2 VL?

是否支持VLM 输出logprobs

OK，thanks for your reply.