Unifying model with inputs.
Two models I'm trying to load in tract have . in the symbol names: DynamicDimension.0 and DynamicDimension.1. I fixed that, but I'm confused why it can't unify a symbol with a value. In addition I tried setting it with --set DynamicDimension.0=1, asserting it with --assert DynamicDimension.0==1, but nothing seems to work.
cargo run --release -- ../../ocr/models/ch_PP-OCRv5_rec_mobile_infer.onnx -i 1,3,48,218
2025-08-19T13:41:26.035560510Z ERROR tract] Error at stage "analyse"
Caused by:
0: ModelBuildingError
1: Failed analyse for node #249 "Conv.0" ConvHir
2: Infering facts
3: Applying rule outputs[0].shape == 1,16,24,109
4: Unifying shapes DynamicDimension.0,16,24,1+(DynamicDimension.1)/2 and 1,16,24,109
5: xx: Impossible to unify Sym(DynamicDimension.0) with Val(1).
Hello ! thanks for your interest in tract.
First, let's clarify a couple of things, just in case.
- during analysis and model preparations, symbols in tract are "universal" not "existential". The model must be valid for any value or combination of values of the symbols, in the limit fixed by the assertions.
- during execution, it switch to being "existential". As soon as the concrete shape of an input allows to infer the actual integer hiding behind a symbol for this turn, the value is fixed for the turn.
- "-i" happens very early, before analysis. it is meant to override shapes in models. for some time, tract was able to do more with symbolic shaped inputs than most model exporters in the ecosystem, so it was useful to override fixed sized model at load time.
- --set before the subcommand happens after analysis and model decluttering
- --set after some subcommands happens before each turn. it is useful when used with a random input for profiling and benching
Now, looking at your error: during analyse tract finds a shape rule while going through a convolution where it manages to compute the output shape of the convolution, with concrete dimensions (no symbols). But this output shape has already been computed and contains a symbol. I'm guessing that this shape was computed by propagation from the output shape (unlikely, the default with the command line is to ignore the output shape in onnx models as it is broken most of the time) or a different input shape.
Does your model have more than one input ? "-i" will only affect one input at a time, you may need to had more of them to override all inputs.
This is happening at analysis, so "--set" will not help. TBH I am not 100% sure how asserts will impact analysis, as they have been introduced for NNEF models that do not go through analysis. There may be something missing here.
And of course, it might very well be a bug :) It looks like you're doing OCR, which is a relatively exotic field for tract models, so it may me activating rare and fragile code paths.
Any chance you can share a model with me if these indications do not help ?
Currently I'm using onnxruntime, was just curious if I could get it working with tract and how it compares performance wise. So far I've had a few issues:
- symbols contain dots in them:
fn identifier<'i>(symbol_table: &SymbolScope, i: &'i str) -> R<'i, Symbol> {
- map(recognize(pair(alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_")))))), |s| {
- symbol_table.sym(s)
- })
+ map(
+ recognize(pair(alt((alpha1, tag("_"))), many0(alt((alphanumeric1, tag("_"), tag(".")))))),
+ |s| symbol_table.sym(s),
+ )
.parse(i)
}
- models use a floor function. for now I worked around it like this, but this is a larger issue as the model seems to assume that dim evaluation uses floating point arithmetic instead of integers:
fn atom<'i>(symbol_table: &SymbolScope, i: &'i str) -> R<'i, TDim> {
map(numeric, TDim::Val),
map(|i| func(symbol_table, "min", i), TDim::Min),
map(|i| func(symbol_table, "max", i), TDim::Max),
+ map(|i| func(symbol_table, "floor", i), |xs| xs[0].clone()),
map(|i| identifier(symbol_table, i), TDim::Sym),
map(pair(recognize(stag("-")), |i| atom(symbol_table, i)), |(_, dim)| dim * -1),
delimited(stag("("), |i| expr(symbol_table, i), stag(")")),
- the tdim parser seems to blow up exponentially the way that it is written, causing some small models 4MB not being able to load in a reasonable time frame (looks like it would terminate eventually):
you can try this model for example: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.3.0/onnx/PP-OCRv5/det/ch_PP-OCRv5_mobile_det.onnx
- the above issue seems resolved with this workaround:
diff --git a/onnx/src/tensor.rs b/onnx/src/tensor.rs
index 0feddc3e9..abb53d6b2 100644
--- a/onnx/src/tensor.rs
+++ b/onnx/src/tensor.rs
@@ -46,11 +46,18 @@ pub fn translate_inference_fact(
Ok(DimFact::from(v.to_dim()))
}
Some(tensor_shape_proto::dimension::Value::DimParam(v)) => {
- if v == "?" || (v.starts_with("unk__") && !include_unknown_symbols) {
+ if v == "?"
+ || (v.starts_with("unk__") && !include_unknown_symbols
+ || v == "DynamicDimension.0"
+ || v == "DynamicDimension.1"
+ || v == "DynamicDimension.2")
+ {
Ok(DimFact::default())
} else {
but fails later with (possibly due to issue 2):
[2025-08-20T07:56:14.461082493Z ERROR tract] Error at stage "analyse"
Caused by:
0: ModelBuildingError
1: Failed analyse for node #249 "Conv.0" ConvHir
2: Infering facts
3: Applying rule outputs[0].shape == 1,16,24,109
4: Unifying shapes ?,16,24,1+(DynamicDimension.1)/2 and 1,16,24,109
5: Impossible to unify Add([Val(1), Div(Sym(DynamicDimension.1), 2)]) with Val(109).
you can try with this model: https://www.modelscope.cn/models/RapidAI/RapidOCR/resolve/v3.3.0/onnx/PP-OCRv5/rec/ch_PP-OCRv5_rec_mobile_infer.onnx
About the dots in symbol... do you know how this model was generated ? I'm trying to guess if dots in symbols is an idiosyncrasy of this particular models for which we can hack a fix (a simple python fix can traverse the protobug graph to fix them for instance) or something more current for which we want stable integrated support in tract.
Looking into the rest.
don't really know how the models are generated (not an AI expert). just started this week on this (new job) and documentation is mostly in chinese. currently can image pdfs using pdfium and ocr it using code based on [0] and models from [1] and back annotate the text again using pdfium. works surprisingly well, quite impressed with the models so far.
- [0] https://github.com/mg-chao/paddle-ocr-rs
- [1] https://github.com/RapidAI/RapidOCR/blob/main/python/rapidocr/default_models.yaml
Allowed the dots and added the floor function in #1823., just basically like you did. I also rewrote the recursing bit of the parser so it is less likely to blow up with nested tdim expressions.
This leaves us the most difficult part :)
With the previous patches in places on tract main, the above model now fails like this. It looks like tract computed the output shape (from inputs ?) but it does not match another value for the output shape.
Questions: 1/ where does this expression come trom ? 2/ why is it different (I think an assertion on the parity of DD1. would help here, but this does not exist right now).
Caused by:
0: ModelBuildingError
1: Failed analyse for node #239 "Conv.0" ConvHir
2: Infering facts
3: Applying rule outputs[0].shape == DynamicDimension.0,16,(DynamicDimension.1+1)/2,(DynamicDimension.2+1)/2
4: Unifying shapes DynamicDimension.0,16,1+(DynamicDimension.1)/2,1+(DynamicDimension.2)/2 and DynamicDimension.0,16,(DynamicDimension.1+1)/2,(DynamicDimension.2+1)/2
5: Impossible to unify Add([Val(1), Div(Sym(DynamicDimension.1), 2)]) with Div(Add([Sym(DynamicDimension.1), Val(1)]), 2).
Answer to 1 is, from the ONNX file !
A couple of things hints that the model comes from https://www.paddleocr.ai/main/en/version2.x/legacy/paddle2onnx.html . So this ONNX converter apparently includes symbolic shape information on each node of the model. I don't think I have seen that before (or possible it just worked with no fuss). And it assumes different semantics that ours (not sure if somebody is right or wrong here).
So one (brutal) idea is to just allow to ignore the "value_info" facts from the onnx graph altogether, which is what I did in #1827.
This brings us to the next problem:
cargo run -p tract -- ch_PP-OCRv5_mobile_det.onnx --onnx-ignore-value-info -i b,3,x,y,f32 dump --io-long
Caused by:
0: ModelBuildingError
1: Failed analyse for node #723 "Concat.0" InferenceConcat
2: Infering facts
3: Applying rule inputs[0].shape[2] == inputs[1].shape[2] == inputs[2].shape[2] == inputs[3].shape[2]
4: Impossible to unify MulInt(8, Div(Add([Sym(x), Val(31)]), 32)) with MulInt(4, Broadcast([MulInt(2, Div(Add([Sym(x), Val(31)]), 32)), Div(Add([Sym(x), Val(15)]), 16)])).
So constraints on the input shape may be required after all.
Will it work with some numeric values ? @dvc94ch suggested "1,3,48,218" as an input shape.
Unfortunately this this not much better:
Caused by:
0: ModelBuildingError
1: Failed analyse for node #723 "Concat.0" InferenceConcat
2: Infering facts
3: Applying rule inputs[0].shape[2] == inputs[1].shape[2] == inputs[2].shape[2] == inputs[3].shape[2]
4: Impossible to unify Val(16) with Mul([Val(4), Broadcast([Add([Add([Add([Add([Broadcast([Val(3), Val(4)]), Val(1)]), Val(1)]), Val(-3)]), Val(1)]), Add([Add([Add([Add([Broadcast([Val(3), Val(4)]), Val(1)]), Val(1)]), Val(-3)]), Val(1)])])]).
We have unsimplified broadcast operators here, like "Broadcast([Val(3), Val(4)])", which are invalid (3 and 4 can not be broadcasted against each other with numpy rules).
@dvc94ch how confident are you about the validity of "1,3,48,218" as an input shape ? if you're sure it's valid, it means tract messes up the computation of the shape somewhere on the way. The next step would be to compare what tract computes against either what the expression in info_value actually tell us, or against what onnxruntime computes for these shapes and find out where tract diverges.
Awesome progress! Thank you!
Unless there's a typo it should be a valid input shape printed from an onnxruntime run. Can't check until Monday as it's on my work laptop. Will try to dump the graph from onnxruntime for debugging.
Ha, thanks, a dump of the graphs with the shapes would be ideal.
The three models required for the full ocr flow are:
-
detecting text boxes model:
ch_PP-OCRv5_mobile_det.onnxstatus: this one parses now, but still fails analysis stage shape:DynamicDimension.0,3,DynamicDimension.1,DynamicDimension.2valid shape:1,3,192,768full shape dump: dump.txt -
detecting angle of text model:
ch_ppocr_mobile_v2.0_cls_infer.onnxstatus: this one always worked as the shape is fixed shape:1,3,48,192 -
the actual ocr stage model:
ch_PP-OCRv5_rec_mobile_infer.onnxstatus: this one also still fails analysis stage shape:DynamicDimension.0,3,48,DynamicDimension.1valid shape:1,3,48,218
this should be helpful indeed. stay tuned.
Any progress? Looking forward to trying this out...
On Mon, Aug 25, 2025, 14:13 Mathieu Poumeyrol @.***> wrote:
kali left a comment (sonos/tract#1819) https://github.com/sonos/tract/issues/1819#issuecomment-3220029479
this should be helpful indeed. stay tuned.
— Reply to this email directly, view it on GitHub https://github.com/sonos/tract/issues/1819#issuecomment-3220029479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFVDL6GGVN3N6COCRA2LVD3PL4YBAVCNFSM6AAAAACEIMDJCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMRQGAZDSNBXHE . You are receiving this because you were mentioned.Message ID: @.***>
Sorry for the delay, but the good news is, with the right set of options, I think we're good. We need to ignore both the internal values AND the output shapes. On the command line, it looks like: --onnx-ignore-value-info --onnx-ignore-output-shapes. On the API it will be .with_ignore_output_shapes(true).with_ignore_value_info(true).
With this, the network loads, analyses and optimizes. As far as I can tell, it should work.
Got it working, however its really slow compared to onnxruntime (around 10x). Didn't optimize or tune anything either for onnxruntime or tract, just dropping it in as a replacement. Maybe onnxruntime runs on the gpu per default? Although its just a lenovo laptop, nothing specially beefy.
let model = tract_onnx::onnx()
.with_ignore_output_shapes(true)
.with_ignore_value_info(true)
.model_for_path(path)?
.with_input_fact(0, f32::fact([1, 3, 1120, 800]).into())?
.into_optimized()?
.into_runnable()?;
not a big surprise here. I will have a look, see if I can see anything obviously wrong in how tract does it. but don't hold your breath. it's very likely onnxruntime goes to the gpu indeed. tract gpu support is nascent, with a focus on the llm probiem at the current point, so 1D/2D convnets will not take advantage of it in the current state of affairs.
I noticed the dominating presence of Resize in the split. This operator is not completely optimized. it is a "big" operator, there are lot of combinations of options that may need specific attention. It reprensent 46% of the model time, so there may be some significant gains here. Not enough to catch up with onnx runtime still, but that would be an interesting first step.
Good news is, all this Resize operators are identical, so we would only have to look into this configuration.
Just for fun and get an idea of the state of affair for non-llm model support on cuda: left is cuda, right is cpu:
We're not there yet :) Only a couple of matrix multiplication and arithmetics operators are offloaded to the gpu, having only a tiny marginal impact of speed.
doubt I'll get time to work on adding support for Resize on the GPU (since onnxruntime works fine for us atm), but thanks a lot for your help getting this far
I was not suggesting this, Resize on Cuda on its own would have limited impact here, to leverage cuda, we need large sections of the model (ideally all of it) to run on the gpu. So we need Resize, convolution, depthwise convolution, etc.
The usage of Resize is intriguing here. If I am not mistaken it is used to generate bigger tensors from smaller ones, by just "tiling" the data (in an image analogy, it would make a bigger image by replacing each pixes by a square of, say, 8x8 pixels using the color of the incoming pixel. I will have a look at this, because if I'm right, the Tile operator may be significantly faster, so we may optimize Resize easily.
I am also trying to wrap my head around why it does that. The output seems to be pushed to a big concat on the depth axis, which in turn goes through a convolution. So maybe we can find some relatively easy optimisations for the CPU.
I will keep this alive as a low priority task / hobby, I'm curious.
I am also thinking that onnxruntime probably does not go to the GPU if you don't ask for it. What I think it may do, on the other hand, is make use of multithreaded implementation of operators (I think Resize, matmul and convolution are good candidates for this). multi-threading in tract is opt-in, and only the matrix multiplication has a multi-threaded implementation (with limited benefits).