vllm [VLM][Core] Support profiling with multiple multi-modal inputs per prompt

The calculation of get_max_multimodal_tokens is designed for a single instance of multi-modal data (e.g. image), so it is inconsistent with dummy data when the dummy data contains multiple instances of multi-modal data.

To support the above case, this PR introduces the --limit-mm-per-prompt argument which limits how many instances of multi-modal data are allowed per prompt. During profiling, the total number of multimodal tokens for a given modality can be obtained by multiplying the result of get_max_multimodal_tokens by the corresponding limit.

Since the dev API has been significantly changed, this is also a good opportunity to rename SupportsVision to SupportsMultiModal.

Checklist

[x] Update MultiModalConfig and CLI args with the new argument
[x] Update the calculation for the total number of multimodal tokens
[x] Enforce the limit during profiling (InputRegistry.dummy_data_for_profiling)
[x] Enforce the limit during inference (MultiModalRegistry.map_input)
[x] Add corresponding tests (except for calculation and profiling)
[x] Rename SupportsVision to SupportsMultiModal.

Aug 04 '24 15:08 DarkLight1337

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

Aug 04 '24 15:08 github-actions[bot]

Marking as draft to avoid conflicting with #7258. (It's still ready for review, but don't merge this yet)

Aug 07 '24 10:08 DarkLight1337

#7258 has been merged, so I'm marking it as ready again.

Aug 09 '24 03:08 DarkLight1337

Hi~ Does vllm support multiple image input now?

Aug 19 '24 08:08 xyfZzz

Hi~ Does vllm support multiple image input now?

@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned!

Aug 19 '24 08:08 ywang96

Hi~ Does vllm support multiple image input now?

@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned!

Thanks! Since another three weeks have passed, I would like to ask if vllm now supports multiple image inputs?

Sep 08 '24 10:09 xyfZzz

Yes, it's supported now. Please check out the docs.

Sep 08 '24 10:09 DarkLight1337

Yes, it's supported now. Please check out the docs.

@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Could you please help me find out why this error occurred?

Sep 09 '24 05:09 xyfZzz

Yes, it's supported now. Please check out the docs.

@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
Could you please help me find out why this error occurred?

I found the cause of the error. I should set --limit-mm-per-prompt image=2 when deploying.

Sep 09 '24 05:09 xyfZzz