vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Model] Add Support for Ovis1.6-Gemma2-9B Model

Open Player256 opened this issue 11 months ago • 10 comments
trafficstars

This pull request addresses issue #9638 by adding support for the Ovis1.6-Gemma2-9B model.

FIX #8972 FIX #9638

Player256 avatar Dec 16 '24 21:12 Player256

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

github-actions[bot] avatar Dec 16 '24 21:12 github-actions[bot]

any news?

Swipe4057 avatar Jan 15 '25 14:01 Swipe4057

Hey @Isotr0py could you give this PR a review?

Player256 avatar Feb 03 '25 12:02 Player256

Please address pre-commit linting errors as well.

Isotr0py avatar Feb 04 '25 07:02 Isotr0py

Please address pre-commit linting errors as well.

Thanks @Isotr0py for the review, I'll get back to it.

Player256 avatar Feb 04 '25 08:02 Player256

will this PR cover also new Ovis 2 models? https://huggingface.co/collections/AIDC-AI/ovis2-67ab36c7e497429034874464

ismael-dm avatar Feb 24 '25 13:02 ismael-dm

I'll add the tests for it.

Player256 avatar Feb 27 '25 09:02 Player256

@Isotr0py I am facing this issue in the OvisProcessor.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ubuntu/oracle/vllm/test.py", line 5, in <module>
[rank0]:     model = LLM(model=model_name,max_model_len=8192)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/utils.py", line 1045, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/entrypoints/llm.py", line 243, in __init__
[rank0]:     self.llm_engine = self.engine_class.from_engine_args(
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
[rank0]:     engine = cls(
[rank0]:              ^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 277, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/engine/llm_engine.py", line 426, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/executor/executor_base.py", line 102, in determine_num_available_blocks
[rank0]:     results = self.collective_rpc("determine_num_available_blocks")
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/utils.py", line 2232, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/worker/worker.py", line 229, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/opt/conda/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/worker/model_runner.py", line 1243, in profile_run
[rank0]:     self._dummy_run(max_num_batched_tokens, max_num_seqs)
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/worker/model_runner.py", line 1308, in _dummy_run
[rank0]:     .dummy_data_for_profiling(self.model_config,
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/inputs/registry.py", line 336, in dummy_data_for_profiling
[rank0]:     dummy_data = profiler.get_dummy_data(
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/profiling.py", line 168, in get_dummy_data
[rank0]:     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/profiling.py", line 141, in _get_dummy_mm_inputs
[rank0]:     return self.processor.apply(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1476, in apply
[rank0]:     ) = self._cached_apply_hf_processor(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1268, in _cached_apply_hf_processor
[rank0]:     ) = self._apply_hf_processor_main(
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1209, in _apply_hf_processor_main
[rank0]:     prompt_ids = self._apply_hf_processor_text_only(prompt)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1132, in _apply_hf_processor_text_only
[rank0]:     prompt_ids, _, _ = self._apply_hf_processor_text_mm(
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1102, in _apply_hf_processor_text_mm
[rank0]:     processed_data = self._call_hf_processor(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/model_executor/models/ovis.py", line 378, in _call_hf_processor
[rank0]:     return super()._call_hf_processor(prompt=prompt,
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/multimodal/processing.py", line 1065, in _call_hf_processor
[rank0]:     return self.info.ctx.call_hf_processor(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/oracle/vllm/vllm/inputs/registry.py", line 172, in call_hf_processor
[rank0]:     raise RuntimeError(msg) from exc
[rank0]: RuntimeError: Failed to apply OvisProcessor on data={'text': '<image>'} with kwargs={}

Somehow the <image> token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because I have tried to pass input_ids with image_placeholder token ids and pixel values which is outputted by the processor. I dont know exactly where that goes.

Player256 avatar Mar 02 '25 18:03 Player256

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Player256.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Mar 03 '25 02:03 mergify[bot]

Somehow the token is not handled properly during the profiling phase of vLLM. Can you point me into the right direction how is multimodal processing done in vLLM? Because I have tried to pass input_ids with image_placeholder token ids and pixel values which is outputted by the processor. I dont know exactly where that goes.

I thought you need to implement the text-only processing for OvisProcessor, because text and image will be fed to the processor separately in some cases. (IIRC, the original Ovis Processor doesn't support text-only imputs)

Isotr0py avatar Mar 03 '25 12:03 Isotr0py

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @Player256.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Apr 06 '25 03:04 mergify[bot]

Closing as superseded by #17861

DarkLight1337 avatar May 10 '25 16:05 DarkLight1337