burn
burn copied to clipboard
Bug with Gather Node
I am trying to import Depth-Anything-v2 in Burn through ONNX but burn-import fails on a particular Gather Node
DEBUG onnx_ir::from_onnx: renaming node "/blocks.0/attn/Gather_3"
ERROR burn_import::logger: PANIC => panicked at /home/akshit/storage/projects/burn/crates/onnx-ir/src/rank_inference.rs:884:31:
attempt to subtract with overflow
To Reproduce
- Download this model - https://github.com/fabio-sim/Depth-Anything-ONNX/releases/download/v2.0.0/depth_anything_v2_vits_dynamic.onnx
- Try to import it in Burn
The error occurs on crates/onnx-ir/src/rank-inferences.rs - L884
let output_rank = indices_rank + input_tensor.rank - 1;
The inputs to /blocks.0/attn/Gather_3 are-
Axis: 0
Data:
name: /blocks.0/attn/Transpose_output_0
tensor: float32[3,batch_size,6,floor(height/14)*floor(width/14) + 1,64]
Indices: name: /Constant_5_output_0 category: Initializer tensor: int64 0
The input_tensor.rank is incorrectly inferred as being 0 here. This happened because of a preceding Reshape node
Reshape takes the output shape has input, but with the current state of onnx-ir we don't really capture the adequate info. The shape field for almost all tensors is not populated (i.e., None). So when trying to infer the rank of the output for a Reshape operation, we need to know the number of elements in the shape input, but we probably don't have it.
And actually, the current implementation just checks for constant inputs. So even if the shape attribute was available, it is not propagated.
https://github.com/tracel-ai/burn/blob/5d16339e6f74b857391da2e44564de6764a07d1a/crates/onnx-ir/src/rank_inference.rs#L348-L368
The result: the output rank is set as 0. But it's incorrect.
Ohk, so if I understand correctly this seems to be deeper issue that would not be easily solved by a few patches.
Should I just get started with writing the pytorch import code then? Or try to solve it?
Ohk, so if I understand correctly this seems to be deeper issue that would not be easily solved by a few patches.
Yeah this is somewhat of a limitation with the current IR. The shapes are almost always unnecessary, so they were left empty because only the rank is required for burn tensors. But the Reshape node is an exception that is not accounted for. So it breaks apart because of the previous assumption. The rank of the output tensor from a Reshape operation is determined by the number of elements in the 1D shape input (i.e., the size of that dimension).
You might be able to fix the issue for this specific model by ensuring that the shape for that input is available up to this point. But this might require capturing shapes (not just rank) for previous operations. Might be manageable if you can narrow it down to a couple of nodes, but you can see how this doesn't scale 😅
That's a big reason why we'd like to rework how shapes are handled.
Should I just get started with writing the pytorch import code then? Or try to solve it?
Totally up to you 🙂 two different approaches
This has been fixed by https://github.com/tracel-ai/burn/pull/3381