vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Usage]: I would like to know how to transfer fps and max_pixels after starting a qwen2vl-7b service using vllm?

Open hyyuananran opened this issue 1 year ago • 7 comments

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

Before submitting a new issue...

  • [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

hyyuananran avatar Dec 31 '24 07:12 hyyuananran

You should sample the video frames outside of vLLM.

You can set max_pixels via the mm_processor_kwargs key (which is passed alongside multi_modal_data) in offline inference. This isn't supported in online inference though, so if you're using vllm serve then you have to pass it at startup time.

DarkLight1337 avatar Dec 31 '24 10:12 DarkLight1337

I passed the max_pixels parameter through mm_processor_kwargs, but encountered an error: api_server.py error argument --mm_processor_kwargs invalid loads value max_pixels:798 command: python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8088 --model /app/qwen2vl-7b --tensor-parrallel 1 --gpu-memory-utilization 0.95 --served-model-name qwen2vl-7b --mm_processor_kwargs {"max_pixels":798} --trust-remote-code

hyyuananran avatar Jan 02 '25 06:01 hyyuananran

You need to pass it as a JSON string. You can enclose it with single quotes, i.e. '{"max_pixels":798}'

DarkLight1337 avatar Jan 02 '25 06:01 DarkLight1337

You should sample the video frames outside of vLLM.

You can set max_pixels via the mm_processor_kwargs key (which is passed alongside multi_modal_data) in offline inference. This isn't supported in online inference though, so if you're using vllm serve then you have to pass it at startup time.

@DarkLight1337 Hi, for Qwen2.5-VL online inference, I expect to pass the fps parameter to mm_processor_kwargs, which is required to calculate the second_pre_grid_t parameter accurately. But I see that extra_body does not support the mm_processor_kwargs parameter at present. I would like to ask, do we have plans to support passing the fps parameter through mm_processor_kwargs by extra_body or something else?

https://docs.vllm.ai/en/v0.6.3/serving/openai_compatible_server.html

wulipc avatar Feb 18 '25 07:02 wulipc

We don't yet have plans to add this. Feel free to open a PR and contribute to this!

DarkLight1337 avatar Feb 18 '25 07:02 DarkLight1337

We don't yet have plans to add this. Feel free to open a PR and contribute to this!

@DarkLight1337 OK, let me confirm, the optimal solution is to pass mm_processor_kwargs through extra_body, right?

wulipc avatar Feb 18 '25 07:02 wulipc

Yes, I think that would work.

DarkLight1337 avatar Feb 18 '25 07:02 DarkLight1337