fxmarty
fxmarty
Thank you for the report, will have a look shortly!
Hi @saaraahfar, if using e.g. https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/tree/main, the ONNX export is not supported in Optimum as it uses custom modeling code. However, we could support it similar to https://github.com/huggingface/optimum/pull/1874
Let me check now
The culprit is: https://github.com/huggingface/transformers/blob/9fdf158aa0987f6073d2816ad004dc09226350e2/src/transformers/models/deformable_detr/modeling_deformable_detr.py#L695-L707 that uses a custom CUDA kernel. I believe the ONNX export is not able to parse correctly the try/catch. We will need to change transformers code...
Can you add an entry with `deformable-detr` here? https://github.com/huggingface/optimum/blob/ccf4b4dbb6d5f4421551ed0d83e0eb07b0261257/tests/exporters/exporters_utils.py#L52 For example with https://huggingface.co/hf-internal-testing/tiny-random-DeformableDetrModel/ . This is for the ONNX export tests.
Thanks @ashim-mahara . If https://github.com/huggingface/optimum/pull/992 lands in next transformers release we'll be able to merge I believe.
Hi @Yosshi999, thank you for the report. PyTorch 1.11 is more than 2 years old, do you face the same issue with a more recent version of pytorch?
Thank you, is updating to torch==1.13 or torch>=2.0 an option to you?
Hi @ideasbyjin, as far as I know, nobody has worked so far on the export of bitsandbytes quantized models to ONNX. cc @xenova have you worked on int4/int8 quantization, where...
Hi @un-certainty , yes if you are using CUDAExecutionProvider, using IO Binding is probably helpful. I don't have a proper benchmark at hand though. > Also I wonder if the...