vllm [New Model]: Florence-2

The model to consider.

https://huggingface.co/microsoft/Florence-2-base

The closest model vllm already supports.

phi-3v , its a vlm

What's your difficulty of supporting the model you want?

No response

Jun 27 '24 21:06 localbarrage

@DarkLight1337 Anyone working on this?

Jul 01 '24 20:07 chandeldivyam

No, but please wait for #5852 and #5276 to land first as they involve significant API changes for devs. In the meantime, you can take a look at at this guide to get an idea of how to implement a new model.

Jul 02 '24 00:07 DarkLight1337

Thanks, checking the guide and the previous PRs of adding phi3-vision, also #5276

Jul 02 '24 03:07 chandeldivyam

Both https://github.com/vllm-project/vllm/pull/5852 and https://github.com/vllm-project/vllm/pull/5276 is merged. Do you still have plans to work on this PR @chandeldivyam ?

Aug 16 '24 21:08 fcakyon

@fcakyon Thanks for the reminder, it actually slipped my mind. Yes, I need florence-2 for a project I was working on. So, as an alternative for quick prototyping, I created a flask server but it is not the ideal solution. I will pick it up in the next week. Thanks!

Are you working on something that would need it?

Aug 17 '24 00:08 chandeldivyam

@chandeldivyam Yes, I also need such a solution for my work. I'm trying to utilize https://github.com/Lightning-AI/LitServe since I only have a little experience with the vllm-project.

Aug 17 '24 10:08 fcakyon

@fcakyon have you looked into any benchmarking for litserve? Also, I think using vllm would make sense if there are ton of parallel requests right?

Aug 19 '24 04:08 chandeldivyam

@chandeldivyam Would be great to see florence-2 in vllm.

Aug 24 '24 23:08 pseudotensor

Hey @chandeldivyam, Is there a PR already to track the progress on Florence-2? Would be great to have Florence-2 with vllm 😀

Sep 05 '24 21:09 bhavnicksm

Since there's been no update on this issue, this week I referred to the guide here and looked at how to add Phi3-vision to vLLM. I implemented the registry, but I ran into the following issue:

File "/app/vllm/entrypoints/llm.py", line 177, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/engine/llm_engine.py", line 541, in from_engine_args
    engine = cls(
             ^^^^
  File "/app/vllm/engine/llm_engine.py", line 302, in __init__
    self.model_executor = executor_class(
                          ^^^^^^^^^^^^^^^
  File "/app/vllm/executor/executor_base.py", line 47, in __init__
    self._init_executor()
  File "/app/vllm/executor/gpu_executor.py", line 38, in _init_executor
    self.driver_worker = self._create_worker()
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/executor/gpu_executor.py", line 105, in _create_worker
    return create_worker(**self._get_create_worker_kwargs(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/executor/gpu_executor.py", line 24, in create_worker
    wrapper.init_worker(**kwargs)
  File "/app/vllm/worker/worker_base.py", line 449, in init_worker
    self.worker = worker_class(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/vllm/worker/worker.py", line 101, in __init__
    self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
                                            ^^^^^^^^^^^^^^^^^
  File "/app/vllm/worker/enc_dec_model_runner.py", line 115, in __init__
    assert_enc_dec_mr_supported_scenario(self)
  File "/app/vllm/worker/utils.py", line 43, in assert_enc_dec_mr_supported_scenario
    raise NotImplementedError(
NotImplementedError: Multimodal is not currently supported with encoder/decoder models.

This error indicates that the Florence2 configuration has is_encoder_decoder:true, but the current EncoderDecoderModelRunner does not support multimodal. I think finding a workaround will be difficult since we really need this support. Can anyone give advice or suggest what to do next?

Sep 06 '24 06:09 SteveKo837

This error indicates that the Florence2 configuration has is_encoder_decoder:true, but the current EncoderDecoderModelRunner does not support multimodal. I think finding a workaround will be difficult since we really need this support. Can anyone give advice or suggest what to do next?

If only the language part of the model is using encoder-decoder (i.e. there is no cross-attention between text and visual features), then you can try implementing only the language part in vLLM first.

Sep 06 '24 06:09 DarkLight1337

This error indicates that the Florence2 configuration has is_encoder_decoder:true, but the current EncoderDecoderModelRunner does not support multimodal. I think finding a workaround will be difficult since we really need this support. Can anyone give advice or suggest what to do next?

If only the language part of the model is using encoder-decoder (i.e. there is no cross-attention between text and visual features), then you can try implementing only the language part in vLLM first.

@DarkLight1337, thanks for your comment. I think I understand, and it seems feasible. Since Florence2 only uses the encoder-decoder for the language part, specifically in the Florence2LanguageModel class, I can implement the language part and the vision part (DaViT) separately, then combine them later. I just need to organize the massive 2800 lines in the original modeling_florence.py file properly.

Sep 06 '24 07:09 SteveKo837

Hey whats the update on this one?How to do i Run florence 2 using vllm?

Oct 11 '24 12:10 Akhilrajeevp

+1

Oct 15 '24 04:10 joaomsimoes

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Jan 14 '25 01:01 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Feb 13 '25 01:02 github-actions[bot]

Hey all, any progress on this?

Feb 14 '25 20:02 antonkharkhonov

cc @Isotr0py

Feb 15 '25 04:02 DarkLight1337

Oh, I totally forgot this... 😅 Let me port the ViT for the florence models to finish this.

Feb 15 '25 04:02 Isotr0py

wow @Isotr0py @DarkLight1337 thank you for such a fast reaction

Feb 15 '25 09:02 antonkharkhonov