burn
burn copied to clipboard
ONNX models with import or runtime issues
Tracking: Models That Fail to Import or Run in Burn
This issue tracks ONNX models that cannot currently be onnx converted, rs built, have runtime problems or output accuracies using burn-import. If you encounter a model that fails to import or execute, please comment below or submit a new issue and reference this tracker.
Checklist of Models with Known Issues (working if checked)
Natural Language Processing (NLP)
- [x] ALBERT/BERT models (#1811)
- [x] all-MiniLM-L6-v2 (#600)
- [x] ModernBERT-base (#3130)
- [ ] IBM Granite 4.0 Tiny Preview (models#71)
Multimodal (Vision-Language)
- [x] CLIP ViT-B-32 (Text) (clip-ViT-B-32-text/model.onnx)
- [x] clip-ViT-B-32 Vision
Object Detection
- [x] YOLOv5 model (#1130)
- [x] YOLOv8n (#2822, #675)
- [ ] YOLOv10 model (#2816)
- [x] Yolo11x mode
- [x] YOLOv12m (yolov12m.onnx)
- [ ] YOLO12x
- [ ] Retinaface model (#2037)
- [ ] Face Detector (MediaPipe) (#1370)
Depth Estimation
- [ ] Depth-Anything-v2 (#2592, #2926)
- [ ] dpt-dinov2-small-kitti (#2592)
- [ ] Apple Depth Pro
Generative Models
- [ ] stable-diffusion-xl-base-1.0 Blocked by #3812
Audio/Speech
- [ ] Silero VAD (#1941) Blocked by #724
- [ ] kokoro (onnx model)
Computer Vision - General
- [ ] Resnet (onnx model)
- [ ] XFeat (#3110)
- [ ] Piiranha (#2968)
Optical Flow & Pose Estimation
How to Use
- Add a comment if you find a new failing model, or if your model is fixed by a PR.
- Check off models as they become supported or fixed.
- Reference this issue when creating new ONNX import failure reports.
For operator-level or feature gaps, please also check:
- SUPPORTED-ONNX-OPS.md for coverage
- Help Wanted: Implementing ONNX Ops for operator implementation tracking
- burn-import issues
Quick update I am able to convert yolo11x_opset16.onnx using this WIP PR: #3381
CLIP ViT-B-32 is buildable with https://github.com/tracel-ai/burn/pull/3560 (still under review)
I'll be working on a test harness to test various large models quickly.
RTMW3D-x is buildable with https://github.com/tracel-ai/burn/pull/3564 (still under review)
i am able to import https://huggingface.co/Xenova/albert-large-v2/resolve/main/onnx/model.onnx but rust code has type errors
in rust,shou error: === Tensor Operation Error === Operation: 'Reshape' Reason: 1. The given shape doesn't have the same number of elements as the current tensor. Current shape: [1035], target shape: [1, 1034].
program in python, def init(self, input_dim=1035): super(Net, self).init() self.attention = ChannelAttention(input_dim) self.fc1 = nn.Linear(input_dim, 512) self.fc2 = nn.Linear(512, 256) self.fc3 = nn.Linear(256, 128) self.fc4 = nn.Linear(128, 2) self.relu = nn.ReLU()
torch.onnx.export(model,
dummy_input,
'd:/archicad/model_data/f_best_2.onnx',
export_params=True,
opset_version=12,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={})
have use uv run --script https://raw.githubusercontent.com/tracel-ai/burn/refs/heads/main/crates/burn-import/onnx_opset_upgrade.py change to opset version 16
i am able to import https://huggingface.co/Xenova/albert-large-v2/resolve/main/onnx/model.onnx but rust code has type errors
I have 3 outstanding ONNX related PRs (#3563, #3564, #3550) with fixes. Most likely it's caused by this: #3564.
Hopefully @laggui will have some time to review ;-)
Resnet is buildable:
can we add kokoro to this. as its a very popular tts model. currently has an issue with expand1 - rank which is an already known issue. it also uses albert internally so will probably require albert model to be working for this to work https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx there is also the bin file for voices https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
Even though Yolo11X ONNX file can be converted into rust code, currently the generated rust code can't be built to due to not handing broadcasting (see why automatic full broadcasting is lacking in Burn https://github.com/tracel-ai/burn/issues/1499). I have fixed this broadcasting issue here: https://github.com/tracel-ai/burn/pull/3589
Afterwards, there is another runtime issue with slice that I need to investigate.
Is failing to import due to a scalar input in ConstantOfShape https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/vae_decoder/model.onnx
ERROR burn_import::logger: PANIC => panicked at C:\Users\Danilo\.cargo\registry\src\index.crates.io-1949cf8c6b5b557f\onnx-ir-0.18.0\src\node\constant_of_shape.rs:34:18:
ConstantOfShape node must have a Tensor with a non-empty static shape value
attribute in netron
tensor: float32[1]
[
0
]
I have added a harness to test models: https://github.com/tracel-ai/burn/tree/main/crates/burn-import/model-checks
YOLOx11 is passing for tch and ndarray backends but currently it's failing on metal due to: https://github.com/tracel-ai/burn/issues/3600 bug in metal.
I am working on clip-vit-b-32-text next. Locally it's passing (with 5-6 fixes in burn-import related to broadcasting and other issues). I have identified one bug in ndarray backend related to int lower open. I will submit a PR shortly and a bug report for ndarray (if I can't fix it).
Recently had issues with Apple Depth Pro , could it be added? It has a script for weights.
Another one of interest to me is LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias also with a list of checkpoints, also with issues importing in Burn (on a recent main).
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/vae_encoder/model.onnx
Failing to load with
Slice: steps other than 1 are not supported
step is -1
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/vae_encoder/model.onnx
Failing to load with
Slice: steps other than 1 are not supportedstep is
-1
With the latest branch, I have the same issue with yolor, looking forward to fully utilize burn-import!!!
The slice limitation currently comes from Burn's tensor.slice(...) which doesn't accept step != 1 for ranges. We should improve this.
I will check what exactly -1 step means. If it inverts the slice or if it simply changes the index relative position/orientation
CLIP ViT-B/32 text model ONNX:
- Convertable to rust
- Buildable
- Runnable
- Accurate compared with ONNX Runtime outputs
The PR is under the review: https://github.com/tracel-ai/burn/pull/3623
CLIP ViT-B-32 fixes are merged. tch, metal and ndarray backends work.
To test:
cd crates/burn-import/model-checks/clip-vit-b-32-text
./get_model.py #or uv run ./get_model.py or python ./get_model.py
cargo run --release # tch default
#ndarray
cargo run --release --no-default-features --features ndarray
#metal
cargo run --release --no-default-features --features metal
I can confirm Retinaface works fine now too
Arcface can be converted, but panics at runtime with the metal backend, probably some of the same issues as in #3635 https://huggingface.co/FoivosPar/Arc2Face/blob/main/arcface.onnx
Yolo11x with static shape works, but with dynamic shape fails to import (python from ultralytics import YOLO ; YOLO("yolo11x.pt").export(format="onnx", dynamic=True) )
I can confirm Retinaface works fine now too
Arcface can be converted, but panics at runtime with the metal backend, probably some of the same issues as in #3635 https://huggingface.co/FoivosPar/Arc2Face/blob/main/arcface.onnx
Yolo11x with static shape works, but with dynamic shape fails to import (python
from ultralytics import YOLO ; YOLO("yolo11x.pt").export(format="onnx", dynamic=True))
Thanks for reporting
clip-ViT-B-32 Vision can be converted, but the converted code fails to compile due to incompatible tensor dims
error[E0308]: mismatched types
--> clip_vision.rs:835:42
|
835 | let add1_out1 = concat3_out1.add(gather2_out1);
| --- ^^^^^^^^^^^^ expected `3`, found `2`
| |
| arguments to this method are incorrect
|
= note: expected struct `burn::tensor::Tensor<_, 3>`
found struct `burn::tensor::Tensor<_, 2>`
error[E0308]: mismatched types
--> clip_vision.rs:3259:9
|
698 | pub fn forward(&self, input1: Tensor<B, 4>) -> Tensor<B, 2> {
| ------------ expected `burn::tensor::Tensor<B, 2>` because of return type
...
3259 | div27_out1
| ^^^^^^^^^^ expected `2`, found `3`
|
= note: expected struct `burn::tensor::Tensor<_, 2>`
found struct `burn::tensor::Tensor<_, 3>`
clip-ViT-B-32 Vision can be converted, but the converted code fails to compile due to incompatible tensor dims
error[E0308]: mismatched types --> clip_vision.rs:835:42 | 835 | let add1_out1 = concat3_out1.add(gather2_out1); | --- ^^^^^^^^^^^^ expected `3`, found `2` | | | arguments to this method are incorrect | = note: expected struct `burn::tensor::Tensor<_, 3>` found struct `burn::tensor::Tensor<_, 2>` error[E0308]: mismatched types --> clip_vision.rs:3259:9 | 698 | pub fn forward(&self, input1: Tensor<B, 4>) -> Tensor<B, 2> { | ------------ expected `burn::tensor::Tensor<B, 2>` because of return type ... 3259 | div27_out1 | ^^^^^^^^^^ expected `2`, found `3` | = note: expected struct `burn::tensor::Tensor<_, 2>` found struct `burn::tensor::Tensor<_, 3>`
Fixed: https://github.com/tracel-ai/burn/pull/3673
The issue was in Gather operator.
@AdrianEddy #3673 is merged. It also addresses your concert regarding: indices = Tensor::<B, 1, _>::from_data
In the PR fix (#3673), the indices are loaded from weights file now. Constant indices are preserved and not converted to static values unless the indices used for Shape input gather. This way no copying back end forward.
I can confirm these models run as expected for me now:
- CLIP ViT-B-32 (Text)
- CLIP ViT-B-32 (Vision)
- RetinaFace-resnet50
- Arcface
- Yolo11m (static shape)
I'm super happy with this and grateful for all your hard work, now I can replace ort with Burn entirely in my app!
Fun fact: Burn is 8x faster on macOS than onnxruntime with these models
With the PR #3736, the facenet model run as expected.
Yolov8n works https://github.com/tracel-ai/burn/pull/3750
YOLOv10 model requires scalar topK input and Mod op.
albert/albert-base-v2 model works with this PR fix: https://github.com/tracel-ai/burn/pull/3810
For tch backend on M3 Mac with params: 89,650,188:
========================================
ALBERT Base v2 Burn Model Test
========================================
Initializing ALBERT Base v2 model...
Model initialized in 86.22ms
Saving model structure to artifacts/albert-base-v2_model.txt...
Model structure saved
Loading test data from artifacts/albert-base-v2_test_data.pt...
Data loaded in 1.12ms
Loaded input tensors:
input_ids shape: [1, 128]
attention_mask shape: [1, 128]
token_type_ids shape: [1, 128]
Loaded reference outputs:
last_hidden_state shape: [1, 128, 768]
pooler_output shape: [1, 768]
Running model inference with test input...
Inference completed in 33.69ms
Model output shapes:
output 0 (last_hidden_state): [1, 128, 768]
output 1 (pooler_output): [1, 768]
Comparing model outputs with reference data...
Checking last_hidden_state...
✓ last_hidden_state matches reference data within tolerance (1e-4)!
Checking pooler_output...
✓ pooler_output matches reference data within tolerance (1e-4)!
========================================
Model test completed!
========================================
with Ndarray backend:
========================================
ALBERT Base v2 Burn Model Test
========================================
Initializing ALBERT Base v2 model...
Model initialized in 59.57ms
Saving model structure to artifacts/albert-base-v2_model.txt...
Model structure saved
Loading test data from artifacts/albert-base-v2_test_data.pt...
Data loaded in 502.13µs
Loaded input tensors:
input_ids shape: [1, 128]
attention_mask shape: [1, 128]
token_type_ids shape: [1, 128]
Loaded reference outputs:
last_hidden_state shape: [1, 128, 768]
pooler_output shape: [1, 768]
Running model inference with test input...
Inference completed in 155.25ms
Model output shapes:
output 0 (last_hidden_state): [1, 128, 768]
output 1 (pooler_output): [1, 768]
Comparing model outputs with reference data...
Checking last_hidden_state...
✓ last_hidden_state matches reference data within tolerance (1e-4)!
Checking pooler_output...
✓ pooler_output matches reference data within tolerance (1e-4)!
========================================
Model test completed!
========================================
There is a current limitation in burn-import implementation for large ONNX files (> 2GB). See https://github.com/tracel-ai/burn/issues/3812
That's why stable-diffusion-xl-base-1.0 can't be supported at the moment.