optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Support for Florence 2 model

Open BrainSlugs83 opened this issue 1 year ago • 9 comments
trafficstars

Feature request

When trying to export Florence 2, it fails with a bizarre error message that leads me to believe it's not supported.

D:\Redacted\>optimum-cli export onnx --model microsoft/Florence-2-large --trust-remote-code --framework pt flo2
D:\miniconda3\envs\onnx-export\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\miniconda3\envs\onnx-export\Scripts\optimum-cli.exe\__main__.py", line 7, in <module>
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\commands\optimum_cli.py", line 163, in main
    service.run()
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\commands\export\onnx.py", line 265, in run
    main_export(
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 280, in main_export
    model = TasksManager.get_model_from_task(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\exporters\tasks.py", line 1950, in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\transformers\models\auto\auto_factory.py", line 566, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large.ef29c9b007f906bd278c39bc12ae620398d88c88.configuration_florence2.Florence2Config'> for this kind of AutoModel: AutoModelForVision2Seq.
Model type should be one of BlipConfig, Blip2Config, GitConfig, Idefics2Config, InstructBlipConfig, Kosmos2Config, LlavaConfig, LlavaNextConfig, PaliGemmaConfig, Pix2StructConfig, VideoLlavaConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

Motivation

I want to export an onnx model of Florence 2. I kind of thought that's the type of thing this tool is used for, yeah? Take a non-onnx repo from hugging face, and export an onnx model.

Your contribution

Not really. I'm willing to test if there's something y'all want me to try?

BrainSlugs83 avatar Jun 26 '24 03:06 BrainSlugs83

+1

qingfengcss avatar Jul 19 '24 08:07 qingfengcss

anyone give some tips on how to export?

dragen1860 avatar Jul 25 '24 03:07 dragen1860

I've refactored the DaVit part of Florence to be compatible with Huggingface, if this helps

https://huggingface.co/amaye15/DaViT-Florence-2-large-ft

amaye15 avatar Jul 30 '24 07:07 amaye15

I've refactored the DaVit part of Florence to be compatible with Huggingface, if this helps

https://huggingface.co/amaye15/DaViT-Florence-2-large-ft

So do you succed to export onnx?

dragen1860 avatar Jul 31 '24 01:07 dragen1860

Here's the code to export the vision tower (DaVit) to onnx.

Link

amaye15 avatar Jul 31 '24 03:07 amaye15

I've refactored the DaVit part of Florence to be compatible with Huggingface, if this helps

https://huggingface.co/amaye15/DaViT-Florence-2-large-ft

hi, thank for your reply. Did you export the language model of florenc2?

dragen1860 avatar Aug 01 '24 03:08 dragen1860

Not yet, might have a look at that this weekend.

amaye15 avatar Aug 01 '24 08:08 amaye15

@amaye15 Hi, bro, thank you for you advice. Following toue advice of davit solution, I have succed in exporting my model, but I have no idea how to do post processing, the generated result is a array of float32, can you give me some advice how to do next?

lieding avatar Sep 22 '24 08:09 lieding

related to #1949

tengomucho avatar Oct 09 '24 07:10 tengomucho

Any updates on this ?

tgalery avatar Nov 20 '24 10:11 tgalery

This issue has been marked as stale because it has been open for 30 days with no activity. This thread will be automatically closed in 5 days if no further activity occurs.

github-actions[bot] avatar Dec 21 '24 02:12 github-actions[bot]

I've published my conversion code here, if anyone is interested :)

xenova avatar Feb 15 '25 21:02 xenova

@xenova Joshua, thanks for publishing your florence-2 onnx conversion code. I apologize for the dumb question, but how would I take the set of onnx files your export code produces and then properly use them to do say just pure object detection inference with Florence-2 in pytorch?

I'm guessing it would resemble something like this broken code below, but not really sure how to proceed? `import onnxruntime as ort import numpy as np import torch from PIL import Image from transformers import AutoProcessor

Load ONNX models

vision_encoder_session = ort.InferenceSession("converted/vision_encoder.onnx") encoder_session = ort.InferenceSession("converted/encoder_model.onnx") decoder_session = ort.InferenceSession("converted/decoder_model_merged.onnx")

Load the Florence-2 processor for preprocessing

processor = AutoProcessor.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True)

def process_image(image_path): image = Image.open(image_path).convert("RGB") inputs = processor(images=image, return_tensors="pt") pixel_values = inputs["pixel_values"].numpy() return pixel_values

def encode_image(pixel_values): image_features = vision_encoder_session.run(["image_features"], {"pixel_values": pixel_values})[0] return image_features

def initialize_past_key_values(num_layers, batch_size, num_heads, seq_length, head_dim): past_key_values = { f"past_key_values.{layer}.decoder.key": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) } past_key_values.update({ f"past_key_values.{layer}.decoder.value": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) }) past_key_values.update({ f"past_key_values.{layer}.encoder.key": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) }) past_key_values.update({ f"past_key_values.{layer}.encoder.value": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) }) return past_key_values

def detect_objects(encoder_outputs): batch_size, seq_len, hidden_dim = encoder_outputs.shape encoder_attention_mask = np.ones((batch_size, seq_len), dtype=np.int64) num_heads = 12
head_dim = hidden_dim // num_heads # Calculate per-head dimension past_key_values = initialize_past_key_values(num_layers, batch_size, num_heads, seq_len, head_dim) inputs_embeds = np.zeros((batch_size, seq_len, hidden_dim), dtype=np.float32) use_cache_branch = np.array([0], dtype=np.int64) decoder_inputs = { "encoder_attention_mask": encoder_attention_mask, "encoder_hidden_states": encoder_outputs, "inputs_embeds": inputs_embeds, "use_cache_branch": use_cache_branch, **past_key_values, } decoder_outputs = decoder_session.run(["logits"], decoder_inputs)[0] token_ids = np.argmax(decoder_outputs, axis=-1) detected_objects = processor.tokenizer.batch_decode(token_ids, skip_special_tokens=True) return detected_objects

if name == "main": image_path = "test.jpg" pixel_values = process_image(image_path) image_features = encode_image(pixel_values) detected_objects = detect_objects(image_features)

gicu8ab2 avatar Feb 27 '25 16:02 gicu8ab2