What does this PR do?

As title says :)

I haven't added a test yet, since I couldn't find a tiny random model on the HF hub.

Fixes # (issue)

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Dec 10 '23 19:12 xenova

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Dec 10 '23 19:12 HuggingFaceDocBuilderDev

Still an issue with shapes other than what the model is exported with:

$ optimum-cli export onnx --model hustvl/vitmatte-small-distinctions-646 o --task image-matting
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.1.1+cu121
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:118: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:100: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = int(math.sqrt(num_position))
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if size * size != num_position:
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:104: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if size != height or size != width:
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:411: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_height > 0 or pad_width > 0:
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  max_rel_dist = int(2 * max(q_size, k_size) - 1)
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:153: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  max_rel_dist = int(2 * max(q_size, k_size) - 1)
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if rel_pos.shape[0] != max_rel_dist:
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:167: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  q_coords = torch.arange(q_size)[:, None] * max(k_size / q_size, 1.0)
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  k_coords = torch.arange(k_size)[None, :] * max(q_size / k_size, 1.0)
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:169: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  relative_coords = (q_coords - k_coords) + (k_size - 1) * max(q_size / k_size, 1.0)
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if patch_height > height or patch_width > width:
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating ONNX model o/model.onnx...
        -[✓] ONNX model output names match reference model (alphas)
        - Validating ONNX Model output "alphas":
                -[✓] (2, 1, 64, 64) matches (2, 1, 64, 64)
                -[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: o

Running it with input of shape [1, 1, 64, 92] gives:

Error: Non-zero status code returned while running Gather node. Name:'/backbone/encoder/layer.2/attention/Gather_4' Status Message: indices element out of data bounds, idx=7 must be within the inclusive range [-7,6]

(but it works in pytorch)

Dec 10 '23 19:12 xenova

I've narrowed it down to these 2 lines:

https://github.com/huggingface/transformers/blob/df5c5c62ae253055336f5bb0828ca8e3e15ab6bd/src/transformers/models/vitdet/modeling_vitdet.py#L153
https://github.com/huggingface/transformers/blob/df5c5c62ae253055336f5bb0828ca8e3e15ab6bd/src/transformers/models/vitdet/modeling_vitdet.py#L100

cc @NielsRogge

Dec 10 '23 20:12 xenova

Hi,

Sorry I'm not an ONNX expert so not sure how those lines should be updated. Might be better to ping someone from the Optimum team

Dec 10 '23 20:12 NielsRogge

@xenova If you add a test with a tiny model we can merge this!

Jan 26 '24 08:01 fxmarty

I was looking on the Hub for a tiny random model, but I couldn't find one (so I skipped adding the test). If you'd like, I can add https://huggingface.co/hustvl/vitmatte-small-composition-1k (25M params).

However, there's still an issue with python casts (int(...) and float(...)). Is there a recommended way to handle this? For my custom exports (see here), I've basically just overridden some of the python casts to use pytorch (.to(...)) casts instead.

Will convert to draft while we discuss this.

Jan 26 '24 10:01 xenova

@xenova Thank you so much for the contribution. Since I'm quite new to ONNX. Can you please give me an example to inference your exported ONNX model here in Python?

Here was what I did:

import onnxruntime
from PIL import Image
import numpy as np
from transformers import VitMatteImageProcessor, VitMatteForImageMatting
import torch
from huggingface_hub import hf_hub_download

ort_sess = onnxruntime.InferenceSession("./model/model.onnx",
                                        providers=['CPUExecutionProvider'])
filepath = hf_hub_download(
    repo_id="hf-internal-testing/image-matting-fixtures", filename="image.png", repo_type="dataset"
)
image = Image.open(filepath).convert("RGB")
filepath = hf_hub_download(
    repo_id="hf-internal-testing/image-matting-fixtures", filename="trimap.png", repo_type="dataset"
)
trimap = Image.open(filepath).convert("L")
processor = VitMatteImageProcessor.from_pretrained("hustvl/vitmatte-small-composition-1k")
inputs = processor(images=image, trimaps=trimap, return_tensors="pt")
alpha = ort_sess.run(inputs)

However, this gave me: TypeError: run() missing 1 required positional argument: 'input_feed'

In addition, I wonder if this can be done: providers=['CPUExecutionProvider'] # change to TensorrtExecutionProvider or CUDAExecutionProvider

Thank you so much for your help!

Mar 29 '24 15:03 SysDevHayes

After digging deeper into this, I possibly found a proper way to inference the exported model. Yet still got error. Here was my snippet:

ort_sess.run(['alphas'], {'pixel_values' : torch.rand(1, 4, 640, 960, dytpe=torch.float32).cpu().numpy()})

However, executing the above code returned:

File /media/my_random_things/python3.8_environment/venv3.8_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220, in Session.run(self, output_names, input_feed, run_options)
    218     output_names = [output.name for output in self._outputs_meta]
    219 try:
--> 220     return self._sess.run(output_names, input_feed, run_options)
    221 except C.EPFail as err:
    222     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'Gather_720' Status Message: indices element out of data bounds, idx=59 must be within the inclusive range [-7,6]

The inference in JS by @xenova available here seems very straight-to-the-point. However, since I'm not much familiar with JS, I can't understand what he did under the hood, yet I belive there should be a similar and simple way to do it in Python.

Can someone please help me with this? Thank you so much!

Mar 30 '24 09:03 SysDevHayes

@EricLe-dev You were very close! Here's some example code:

import onnxruntime
from PIL import Image
import numpy as np
from transformers import VitMatteImageProcessor
from huggingface_hub import hf_hub_download

ort_sess = onnxruntime.InferenceSession(
    "./model.onnx", providers=['CPUExecutionProvider']
)
filepath = hf_hub_download(
    repo_id="hf-internal-testing/image-matting-fixtures", filename="image.png", repo_type="dataset"
)
image = Image.open(filepath).convert("RGB")
filepath = hf_hub_download(
    repo_id="hf-internal-testing/image-matting-fixtures", filename="trimap.png", repo_type="dataset"
)
trimap = Image.open(filepath).convert("L")
processor = VitMatteImageProcessor.from_pretrained("hustvl/vitmatte-small-composition-1k")
inputs = processor(images=image, trimaps=trimap, return_tensors="pt")

outputs = ort_sess.run(None, {'pixel_values' : inputs['pixel_values'].numpy()})
alphas = outputs[0]

# Visualize
result = Image.fromarray(np.uint8(alphas[0][0] * 255), mode='L')
result

Produces:

Mar 30 '24 14:03 xenova

@xenova Thank you so much for your answer. I encountered an another issue that I would very appreciate if you can help me. The code snippet you gave me ran perfectly with the model.onnx that you exported available here. However, it could not run properly using my exported model.onnx using exactly the same command as you proposed.

My current system is:

python 3.8.17
torch 1.11.0+cu113
transformers 4.39.2
optimum 1.16.0.dev0 #(installed from your branch)

The different part here was, my system has CUDA 11.3 and I could not install any Pytorch version that is newer than 1.11.0+cu113

With the mentioned system, when I run: optimum-cli export onnx --model hustvl/vitmatte-small-distinctions-646 o --task image-matting

It would complain that: ImportError: cannot import name 'is_torch_less_than_1_11' from 'transformers.pytorch_utils'

In order to get it running, I did a little trick, which are:

Commenting out this line.
Change this line

# from if is_torch_less_than_1_11
# to if False:

Doing all of these trick allowed me to run the command to export the model. All of my output was exactly like yours except it has:

Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating ONNX model o/model.onnx...
	-[✓] ONNX model output names match reference model (alphas)
	- Validating ONNX Model output "alphas":
		-[✓] (2, 1, 64, 64) matches (2, 1, 64, 64)
		-[x] values not close enough, max diff: 4.124641418457031e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- alphas: max diff = 4.124641418457031e-05.
 The exported model was saved at: o

Inferencing the exported model with your code snippet gave me this error:

InvalidArgument                           Traceback (most recent call last)
Cell In[2], line 21
     18 processor = VitMatteImageProcessor.from_pretrained("hustvl/vitmatte-small-composition-1k")
     19 inputs = processor(images=image, trimaps=trimap, return_tensors="pt")
---> 21 outputs = ort_sess.run(None, {'pixel_values' : inputs['pixel_values'].numpy()})
     22 alphas = outputs[0]
     24 # Visualize

File /media/my_random_things/python3.8_environment/venv3.8_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220, in Session.run(self, output_names, input_feed, run_options)
    218     output_names = [output.name for output in self._outputs_meta]
    219 try:
--> 220     return self._sess.run(output_names, input_feed, run_options)
    221 except C.EPFail as err:
    222     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'Gather_720' Status Message: indices element out of data bounds, idx=59 must be within the inclusive range [-7,6]

Does it has anything to do with the version of Transformers? or the version of PyTorch (since you are using Torch 2.1.1+cu121 and Python 3.10)?

Can you please share your full pip list?

Your help is much appreciated!

Mar 30 '24 16:03 SysDevHayes

@xenova Update:

After install onnxruntime-gpu==1.14.0, then ran the following command:

optimum-cli export onnx --model hustvl/vitmatte-base-distinctions-646 o --task image-matting

I got exactly the same output as yours, which is:

Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating ONNX model o/model.onnx...
	-[✓] ONNX model output names match reference model (alphas)
	- Validating ONNX Model output "alphas":
		-[✓] (2, 1, 64, 64) matches (2, 1, 64, 64)
		-[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: o

@EricLe-dev You were very close! Here's some example code:

import onnxruntime
from PIL import Image
import numpy as np
from transformers import VitMatteImageProcessor
from huggingface_hub import hf_hub_download

ort_sess = onnxruntime.InferenceSession(
    "./model.onnx", providers=['CPUExecutionProvider']
)
filepath = hf_hub_download(
    repo_id="hf-internal-testing/image-matting-fixtures", filename="image.png", repo_type="dataset"
)
image = Image.open(filepath).convert("RGB")
filepath = hf_hub_download(
    repo_id="hf-internal-testing/image-matting-fixtures", filename="trimap.png", repo_type="dataset"
)
trimap = Image.open(filepath).convert("L")
processor = VitMatteImageProcessor.from_pretrained("hustvl/vitmatte-small-composition-1k")
inputs = processor(images=image, trimaps=trimap, return_tensors="pt")

outputs = ort_sess.run(None, {'pixel_values' : inputs['pixel_values'].numpy()})
alphas = outputs[0]

# Visualize
result = Image.fromarray(np.uint8(alphas[0][0] * 255), mode='L')
result

Running this piece of code with your exported model work perfectly. However, running it with my exported model gave this error:

InvalidArgument                           Traceback (most recent call last)
Cell In[6], line 21
     18 processor = VitMatteImageProcessor.from_pretrained("Xenova/vitmatte-base-composition-1k")
     19 inputs = processor(images=image, trimaps=trimap, return_tensors="pt")
---> 21 outputs = ort_sess.run(None, {'pixel_values' : inputs['pixel_values'].numpy()})
     22 alphas = outputs[0]
     24 # Visualize

File /media/wand/research/Research/python3.8_environment/venv3.8_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:200, in Session.run(self, output_names, input_feed, run_options)
    198     output_names = [output.name for output in self._outputs_meta]
    199 try:
--> 200     return self._sess.run(output_names, input_feed, run_options)
    201 except C.EPFail as err:
    202     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'Gather_720' Status Message: indices element out of data bounds, idx=59 must be within the inclusive range [-7,6]

Using Netron, I was able to see the difference between your exported model and my exported model. Please see the attached images:

I belive that in order to get the code running, you did change some layers in the model. Can you please confirm this and give me some more ideas how to get it work? Thank you so much!

Mar 31 '24 07:03 SysDevHayes

@xenova Update 2: Sorry for keep posting in this thread but the more I dive deeper in this, the more interesting things I found and I can't stop sharing this to you guys. Seems that I was wrong, seems that you did not modify the layers of the model but instead, there were some differences in my environment and yours.

Here is my pip list:

Package            Version
------------------ ------------
aiohttp            3.9.3
aiosignal          1.3.1
async-timeout      4.0.3
attrs              23.2.0
certifi            2022.12.7
charset-normalizer 2.1.1
coloredlogs        15.0.1
datasets           2.18.0
dill               0.3.8
evaluate           0.4.1
filelock           3.9.0
flatbuffers        24.3.25
frozenlist         1.4.1
fsspec             2024.2.0
huggingface-hub    0.22.2
humanfriendly      10.0
idna               3.4
Jinja2             3.1.2
MarkupSafe         2.1.3
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.2.1
numpy              1.26.3
onnx               1.16.0
onnxruntime        1.17.1
onnxruntime-gpu    1.14.0
optimum            1.16.0.dev0
packaging          24.0
pandas             2.2.1
pillow             10.2.0
pip                24.0
protobuf           5.26.1
pyarrow            15.0.2
pyarrow-hotfix     0.6
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.1
regex              2023.12.25
requests           2.28.1
responses          0.18.0
safetensors        0.4.2
sentencepiece      0.2.0
setuptools         69.1.0
six                1.16.0
sympy              1.12
tokenizers         0.15.2
torch              2.1.1+cu118
torchaudio         2.1.1+cu118
torchvision        0.16.1+cu118
tqdm               4.66.2
transformers       4.36.0.dev0
triton             2.1.0
typing_extensions  4.8.0
tzdata             2024.1
urllib3            1.26.13
wheel              0.42.0
xxhash             3.4.1
yarl               1.9.4

I'm running the code with Python 3.10.12

And this is my nvidia-smi, my CUDA version is 11.4 but I could install the Torch 2.1.1+cu118 and it still works so I'm not sure if this is an issue. Sun Mar 31 14:35:14 2024

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   38C    P8    26W / 390W |   3107MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:82:00.0  On |                  N/A |
|  0%   44C    P8    37W / 390W |   4396MiB / 24265MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Exporting the model with this environment gave me a slightly different model (viewed in Netron), yet still missing the Pow and some other layers compared to yours. This indicating that there is something wrong in my env. Can you please share the full environment that you used to export the model? Thank you so much!

Mar 31 '24 07:03 SysDevHayes

As mentioned here, I did need to update some of the transformers modelling code to get the export working. In particular, you need to watch out for warnings that mention converting to a python float/integer, and then those calls to int() and float() to .to(torch.int64) and .to(torch.float32), respectively.

Mar 31 '24 09:03 xenova

@xenova Thank you for your reply. Here was my output:

optimum-cli export onnx --model hustvl/vitmatte-base-distinctions-646 o --task image-matting
`AnnotionFormat` is deprecated and will be removed in v4.38. Please use `transformers.image_utils.AnnotationFormat` instead.
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.1.1+cu118
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:118: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:100: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = int(math.sqrt(num_position))
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if size * size != num_position:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:104: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if size != height or size != width:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:411: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_height > 0 or pad_width > 0:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  max_rel_dist = int(2 * max(q_size, k_size) - 1)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:153: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  max_rel_dist = int(2 * max(q_size, k_size) - 1)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if rel_pos.shape[0] != max_rel_dist:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:167: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  q_coords = torch.arange(q_size)[:, None] * max(k_size / q_size, 1.0)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  k_coords = torch.arange(k_size)[None, :] * max(q_size / k_size, 1.0)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:169: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  relative_coords = (q_coords - k_coords) + (k_size - 1) * max(q_size / k_size, 1.0)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if patch_height > height or patch_width > width:
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating ONNX model o/model.onnx...
	-[✓] ONNX model output names match reference model (alphas)
	- Validating ONNX Model output "alphas":
		-[✓] (2, 1, 64, 64) matches (2, 1, 64, 64)
		-[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: o

I believe what you meant was editing the lines mentioned in those warnings to .to(torch.int64) and .to(torch.float32)? For example: line 100 in modeling_vitdet.py

size = int(math.sqrt(num_position)) # change this to size = math.sqrt(num_position).to(torch.int64)

Does this what you did? Thank you so much!

Mar 31 '24 10:03 SysDevHayes

After diving in thousand lines of code (using vscode, searching for direct Python casts of int(...) and float(...)), I could not find any casts call of int() and float() that are directly related to transformers/models/vitdet and transformers/models/vitmatte.

I was looking on the Hub for a tiny random model, but I couldn't find one (so I skipped adding the test). If you'd like, I can add https://huggingface.co/hustvl/vitmatte-small-composition-1k (25M params).

However, there's still an issue with python casts (int(...) and float(...)). Is there a recommended way to handle this? For my custom exports (see here), I've basically just overridden some of the python casts to use pytorch (.to(...)) casts instead.

Will convert to draft while we discuss this.

My export output from is exactly as yours, which are:

/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:118: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:100: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = int(math.sqrt(num_position))
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if size * size != num_position:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:104: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if size != height or size != width:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:411: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_height > 0 or pad_width > 0:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  max_rel_dist = int(2 * max(q_size, k_size) - 1)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:153: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  max_rel_dist = int(2 * max(q_size, k_size) - 1)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if rel_pos.shape[0] != max_rel_dist:
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:167: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  q_coords = torch.arange(q_size)[:, None] * max(k_size / q_size, 1.0)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  k_coords = torch.arange(k_size)[None, :] * max(q_size / k_size, 1.0)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:169: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  relative_coords = (q_coords - k_coords) + (k_size - 1) * max(q_size / k_size, 1.0)
/media/my_random_things/python3.10_environment/venv3.10_onnx/lib/python3.10/site-packages/transformers/models/vitdet/modeling_vitdet.py:447: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if patch_height > height or patch_width > width:

There are warnings in this output, most of them are about converting a tensor to a Python boolean. There are 2 lines of warning mentioned something relating to converting a tensor to a Python float and int. Which are:

transformers/models/vitdet/modeling_vitdet.py:100 size = int(math.sqrt(num_position)) is at type float, it does not really cast any tensor here.
transformers/models/vitdet/modeling_vitdet.py:153 max_rel_dist = int(2 * max(q_size, k_size) - 1) the result of 2 * max(q_size, k_size) - 1 is not of type torch.Tensor. Meaning we don't do any int(torch.Tensor) here.

@xenova can you please point me to the right file or line of code that you modified? Thank you a ton!

Apr 01 '24 04:04 SysDevHayes

Let's merge this once https://github.com/huggingface/transformers/pull/30065 is released

Apr 05 '24 08:04 fxmarty

question is where the hell do i get the model?

Jul 20 '24 16:07 Xoui

optimum
optimum copied to clipboard

Add ONNX export for ViTMatte models

What does this PR do?

Before submitting

optimum optimum copied to clipboard

Add ONNX export for ViTMatte models

What does this PR do?

Before submitting

optimum
optimum copied to clipboard