trafficstars

Feature request

When trying to export Florence 2, it fails with a bizarre error message that leads me to believe it's not supported.

D:\Redacted\>optimum-cli export onnx --model microsoft/Florence-2-large --trust-remote-code --framework pt flo2
D:\miniconda3\envs\onnx-export\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\miniconda3\envs\onnx-export\Scripts\optimum-cli.exe\__main__.py", line 7, in <module>
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\commands\optimum_cli.py", line 163, in main
    service.run()
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\commands\export\onnx.py", line 265, in run
    main_export(
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\exporters\onnx\__main__.py", line 280, in main_export
    model = TasksManager.get_model_from_task(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\optimum\exporters\tasks.py", line 1950, in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\miniconda3\envs\onnx-export\Lib\site-packages\transformers\models\auto\auto_factory.py", line 566, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large.ef29c9b007f906bd278c39bc12ae620398d88c88.configuration_florence2.Florence2Config'> for this kind of AutoModel: AutoModelForVision2Seq.
Model type should be one of BlipConfig, Blip2Config, GitConfig, Idefics2Config, InstructBlipConfig, Kosmos2Config, LlavaConfig, LlavaNextConfig, PaliGemmaConfig, Pix2StructConfig, VideoLlavaConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

Motivation

I want to export an onnx model of Florence 2. I kind of thought that's the type of thing this tool is used for, yeah? Take a non-onnx repo from hugging face, and export an onnx model.

Your contribution

Not really. I'm willing to test if there's something y'all want me to try?

Jun 26 '24 03:06 BrainSlugs83

+1

Jul 19 '24 08:07 qingfengcss

anyone give some tips on how to export?

Jul 25 '24 03:07 dragen1860

I've refactored the DaVit part of Florence to be compatible with Huggingface, if this helps

https://huggingface.co/amaye15/DaViT-Florence-2-large-ft

Jul 30 '24 07:07 amaye15

I've refactored the DaVit part of Florence to be compatible with Huggingface, if this helps

https://huggingface.co/amaye15/DaViT-Florence-2-large-ft

So do you succed to export onnx?

Jul 31 '24 01:07 dragen1860

Here's the code to export the vision tower (DaVit) to onnx.

Link

Jul 31 '24 03:07 amaye15

I've refactored the DaVit part of Florence to be compatible with Huggingface, if this helps

https://huggingface.co/amaye15/DaViT-Florence-2-large-ft

hi， thank for your reply. Did you export the language model of florenc2？

Aug 01 '24 03:08 dragen1860

Not yet, might have a look at that this weekend.

Aug 01 '24 08:08 amaye15

@amaye15 Hi, bro, thank you for you advice. Following toue advice of davit solution, I have succed in exporting my model, but I have no idea how to do post processing, the generated result is a array of float32, can you give me some advice how to do next?

Sep 22 '24 08:09 lieding

related to #1949

Oct 09 '24 07:10 tengomucho

Any updates on this ?

Nov 20 '24 10:11 tgalery

This issue has been marked as stale because it has been open for 30 days with no activity. This thread will be automatically closed in 5 days if no further activity occurs.

Dec 21 '24 02:12 github-actions[bot]

I've published my conversion code here, if anyone is interested :)

Feb 15 '25 21:02 xenova

@xenova Joshua, thanks for publishing your florence-2 onnx conversion code. I apologize for the dumb question, but how would I take the set of onnx files your export code produces and then properly use them to do say just pure object detection inference with Florence-2 in pytorch?

I'm guessing it would resemble something like this broken code below, but not really sure how to proceed? `import onnxruntime as ort import numpy as np import torch from PIL import Image from transformers import AutoProcessor

Load ONNX models

vision_encoder_session = ort.InferenceSession("converted/vision_encoder.onnx") encoder_session = ort.InferenceSession("converted/encoder_model.onnx") decoder_session = ort.InferenceSession("converted/decoder_model_merged.onnx")

Load the Florence-2 processor for preprocessing

processor = AutoProcessor.from_pretrained("microsoft/Florence-2-base-ft", trust_remote_code=True)

def process_image(image_path): image = Image.open(image_path).convert("RGB") inputs = processor(images=image, return_tensors="pt") pixel_values = inputs["pixel_values"].numpy() return pixel_values

def encode_image(pixel_values): image_features = vision_encoder_session.run(["image_features"], {"pixel_values": pixel_values})[0] return image_features

def initialize_past_key_values(num_layers, batch_size, num_heads, seq_length, head_dim): past_key_values = { f"past_key_values.{layer}.decoder.key": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) } past_key_values.update({ f"past_key_values.{layer}.decoder.value": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) }) past_key_values.update({ f"past_key_values.{layer}.encoder.key": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) }) past_key_values.update({ f"past_key_values.{layer}.encoder.value": np.zeros((batch_size, num_heads, seq_length, head_dim), dtype=np.float32) for layer in range(num_layers) }) return past_key_values

def detect_objects(encoder_outputs): batch_size, seq_len, hidden_dim = encoder_outputs.shape encoder_attention_mask = np.ones((batch_size, seq_len), dtype=np.int64) num_heads = 12
head_dim = hidden_dim // num_heads # Calculate per-head dimension past_key_values = initialize_past_key_values(num_layers, batch_size, num_heads, seq_len, head_dim) inputs_embeds = np.zeros((batch_size, seq_len, hidden_dim), dtype=np.float32) use_cache_branch = np.array([0], dtype=np.int64) decoder_inputs = { "encoder_attention_mask": encoder_attention_mask, "encoder_hidden_states": encoder_outputs, "inputs_embeds": inputs_embeds, "use_cache_branch": use_cache_branch, **past_key_values, } decoder_outputs = decoder_session.run(["logits"], decoder_inputs)[0] token_ids = np.argmax(decoder_outputs, axis=-1) detected_objects = processor.tokenizer.batch_decode(token_ids, skip_special_tokens=True) return detected_objects

if name == "main": image_path = "test.jpg" pixel_values = process_image(image_path) image_features = encode_image(pixel_values) detected_objects = detect_objects(image_features)

Feb 27 '25 16:02 gicu8ab2

optimum
optimum copied to clipboard

Support for Florence 2 model

Feature request

Motivation

Your contribution

Load ONNX models

Load the Florence-2 processor for preprocessing

optimum optimum copied to clipboard

Support for Florence 2 model

Feature request

Motivation

Your contribution

Load ONNX models

Load the Florence-2 processor for preprocessing

optimum
optimum copied to clipboard