vllm
vllm copied to clipboard
[Model] Add Support for Ovis1.6-Gemma2-9B Model
This pull request addresses issue #9638 by adding support for the Ovis1.6-Gemma2-9B model.
FIX #8972 FIX #9638
👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can do one of these:
- Add
readylabel to the PR - Enable auto-merge.
🚀
any news?
Hey @Isotr0py could you give this PR a review?
Please address pre-commit linting errors as well.
Please address pre-commit linting errors as well.
Thanks @Isotr0py for the review, I'll get back to it.
will this PR cover also new Ovis 2 models? https://huggingface.co/collections/AIDC-AI/ovis2-67ab36c7e497429034874464
I'll add the tests for it.
@Isotr0py I am facing this issue in the OvisProcessor.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ubuntu/oracle/vllm/test.py", line 5, in <module>
[rank0]: model = LLM(model=model_name,max_model_len=8192)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/utils.py", line 1045, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/entrypoints/llm.py", line 243, in __init__
[rank0]: self.llm_engine = self.engine_class.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 277, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 426, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/executor/executor_base.py", line 102, in determine_num_available_blocks
[rank0]: results = self.collective_rpc("determine_num_available_blocks")
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[rank0]: answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/utils.py", line 2232, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/conda/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/worker/worker.py", line 229, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/opt/conda/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/worker/model_runner.py", line 1243, in profile_run
[rank0]: self._dummy_run(max_num_batched_tokens, max_num_seqs)
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/worker/model_runner.py", line 1308, in _dummy_run
[rank0]: .dummy_data_for_profiling(self.model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/inputs/registry.py", line 336, in dummy_data_for_profiling
[rank0]: dummy_data = profiler.get_dummy_data(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/profiling.py", line 168, in get_dummy_data
[rank0]: mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/profiling.py", line 141, in _get_dummy_mm_inputs
[rank0]: return self.processor.apply(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1476, in apply
[rank0]: ) = self._cached_apply_hf_processor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1268, in _cached_apply_hf_processor
[rank0]: ) = self._apply_hf_processor_main(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1209, in _apply_hf_processor_main
[rank0]: prompt_ids = self._apply_hf_processor_text_only(prompt)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1132, in _apply_hf_processor_text_only
[rank0]: prompt_ids, _, _ = self._apply_hf_processor_text_mm(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1102, in _apply_hf_processor_text_mm
[rank0]: processed_data = self._call_hf_processor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/model_executor/models/ovis.py", line 378, in _call_hf_processor
[rank0]: return super()._call_hf_processor(prompt=prompt,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1065, in _call_hf_processor
[rank0]: return self.info.ctx.call_hf_processor(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ubuntu/oracle/vllm/vllm/inputs/registry.py", line 172, in call_hf_processor
[rank0]: raise RuntimeError(msg) from exc
[rank0]: RuntimeError: Failed to apply OvisProcessor on data={'text': '<image>'} with kwargs={}
Somehow the <image> token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because I have tried to pass input_ids with image_placeholder token ids and pixel values which is outputted by the processor. I dont know exactly where that goes.
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Player256.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Somehow the
token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because I have tried to pass input_ids with image_placeholder token ids and pixel values which is outputted by the processor. I dont know exactly where that goes.
I thought you need to implement the text-only processing for OvisProcessor, because text and image will be fed to the processor separately in some cases. (IIRC, the original Ovis Processor doesn't support text-only imputs)
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Player256.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Closing as superseded by #17861