tract icon indicating copy to clipboard operation
tract copied to clipboard

Full support for dynamic If

Open kali opened this issue 2 years ago • 19 comments

ONNX If only works has a loading macro expansion: the condition must be known and fixed at loading time, the right branch is substituted in the type phase. This blocks Silero VAD v4 (see #1029).

  • [x] make analyse work when input is dynamic
  • [x] implement an If operator in core and translate ONNX op into it
  • [ ] find out how to encode this in NNEF
  • [ ] into_optimized as in Scan
  • [ ] declutter as in Scan

kali avatar Apr 25 '23 11:04 kali

Link #1038, covering two first tasks

kali avatar Apr 25 '23 11:04 kali

@kali do you have any plans for implementing task 4 and 5 anytime soon :)? If possible, could you please leave some hints on how it can be implemented, I'd like to give it a try. Thank you.

7r3nzy avatar May 21 '23 23:05 7r3nzy

I think it should start with 5. There is a huge amount of code in Scan to declutter it (decluttering is actually implementing the high level optimization in tract terminology). I think most of what we do in Scan can be mapped to ifte but the Scan code is not generic. Plus we have an ongoing epic that aims at "flattening" Scan, inlining the loop body inside the main model to normalize the node connections so a lot of the Scan declutter code will probably go away or be drastically simplified. So if you start working on ifte and perform similar simplifications, we will later on have to go the same path and flattening the ifte too, making a lot of these efforts redundant.

So if you want to work on ifte decluttering, I think you should keep it use-case driven, not implementing the whole thing before we get more clarity about what is happening with Scan. Does it make sense ? If so, we should probably move this to a different issue, and target optimising the silero vad use-case, at least as a starting point. Refresh my memory. Are we in a position where we can look at a model dump (in the command line) enough to identify possible bottlenecks yet ?

kali avatar May 22 '23 07:05 kali

@kali Thanks for the response. It does make sense. I think you are referring to this PR https://github.com/sonos/tract/pull/1090 probably?

About the bottlenecks I don't think we got to that point, since running the latest tract on silero-vad v4 on main branch (146b2ac last commit) is throwing the following error:

tract silero-vad/files/silero_vad.onnx
┏ 1 Source sr
┃   ━━━ I64
┣┻ 73 Equals Equal_23
┃   ━━━ Bool
┃┏ 3 Source c
┃┃   ━━━ 2,batch,64,F32
┃┃┏ 2 Source h
┃┃┃   ━━━ 2,batch,64,F32
┃┃┃┏ 0 Source input
┃┃┃┃   ━━━ batch,sequence,F32
┣╋╋┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻ 74 If If_25
      ━━━ batch,1,F32
      ━━━ 2,batch,64,F32
      ━━━ 2,batch,64,F32
[2023-06-03T23:08:18.550720900Z ERROR tract] Error at stage analyse

    Caused by:
        0: Failed analyse for node #74 "If_25" If
        1: Failed analyse for node #74 "If_25" If
        2: Infering facts
        3: Unifying shapes Unsqueezeout_dim_0,1 and batch,1
        4: Impossible to unify Sym(Unsqueezeout_dim_0) with Sym(batch).

7r3nzy avatar Jun 04 '23 04:06 7r3nzy

@7r3nzy yes, I'm referring to 1090 (and a couple of them that will follow).

Is it not just again a problem of --onnx-ignore-output-shapes ? Maybe I should actually make ignoring them the default.

kali avatar Jun 05 '23 06:06 kali

@kali Thanks for pointing that out :), It moved to next step I think but I am now getting:

 tract silero-vad/files/silero_vad.onnx --onnx-ignore-output-shapes
[2023-06-05T08:26:31.441282388Z ERROR tract] Error at stage type

    Caused by:
        0: Translating node #74 "If_25" If ToTypedTranslator
        1: Translating node #29 "If_69" If ToTypedTranslator
        2: in output_facts invocation for If
        3: Condition failed: `self.then_body.output_fact(i)?.without_value() == self.else_body.output_fact(i)?.without_value()` (batch,1,sequence+192,F32 vs batch,1,1,sequence+192,F32)

7r3nzy avatar Jun 05 '23 08:06 7r3nzy

Ok this looks more serious, you can use "--pass analyse" to see the network before it crashes. it looks like the "then" and "else" branch have a different rank here, with an extra dimension in the else...

kali avatar Jun 05 '23 08:06 kali

It produced the following output without any crash:

tract silero-vad/files/silero_vad.onnx --onnx-ignore-output-shapes --pass analyse
┏ 1 Source sr
┃   ━━━ I64
┣┻ 73 Equals Equal_23
┃   ━━━ Bool
┃┏ 3 Source c
┃┃   ━━━ 2,batch,64,F32
┃┃┏ 2 Source h
┃┃┃   ━━━ 2,batch,64,F32
┃┃┃┏ 0 Source input
┃┃┃┃   ━━━ batch,sequence,F32
┣╋╋┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻┻ 74 If If_25
      ━━━ batch,1,F32
      ━━━ 2,batch,64,F32
      ━━━ 2,batch,64,F32

7r3nzy avatar Jun 05 '23 08:06 7r3nzy

Ha. And we don't see the then and else branch, it's not plugged in. Can you link me the onnx file again so I can have a look ?

kali avatar Jun 05 '23 08:06 kali

Yep, here it is https://github.com/snakers4/silero-vad/archive/refs/tags/v4.0.tar.gz

Path inside the archive: files/silero_vad.onnx

7r3nzy avatar Jun 05 '23 09:06 7r3nzy

Please have a look at https://github.com/sonos/tract/pull/1102 . It's... complicated, but I guess the network is. I hope you can figure out where the discrep is.

kali avatar Jun 05 '23 11:06 kali

The PR just fixes the dump format so that If branch are shown in the graph.

kali avatar Jun 05 '23 11:06 kali

Thank you so much for this, I used netron as well as the new tract patch to analyse the network:

[loop] ┃┃┣┻ 57 If If_69

┃┃┃ [loop|loop] ┏ 2 Source input_data1 ┃┃┃ [loop|loop] ┃ ━━━ batch,1,1,sequence+192,F32 ┃┃┃ [loop|loop] ┣┻ 1 Squeeze13 Squeeze_71 ┃┃┃ [loop|loop] ━━━ batch,1,sequence+192,F32 ┃┃┃ [loop|else] ┏ 1 Source input_data1 ┃┃┃ [loop|else] ┃ ━━━ batch,1,1,sequence+192,F32 ┃┃┃ [loop|else] ┣ 0 Identity Identity_72 ┃┃┃ [loop|else] ━━━ batch,1,1,sequence+192,F32

If_25 then If_69 

last node name=176 | else type: float32[Squeeze175_dim_0,Identity176_dim_1,Squeeze175_dim_1,Squeeze175_dim_2]
last node name=175 | then type: float32[Squeeze175_dim_0,Squeeze175_dim_1,Squeeze175_dim_2]

I think the important thing to note from the onnx if operator specs is:

The then_branch and else_branch may produce tensors with the same element type and different shapes.

I am also leaving the onnx if operator specs here for easy access:

If

If conditional Version

This version of the operator has been available since version 19 of the default ONNX operator set.

Other versions of this operator: 1, 11, 13, 16 Attributes

else_branch : graph (required) Graph to run if condition is false. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the then_branch. then_branch : graph (required) Graph to run if condition is true. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

Inputs

cond : B Condition for the if

Outputs (1 - ∞)

outputs (variadic, heterogeneous) : V Values that are live-out to the enclosing scope. The return values in the then_branch and else_branch must be of the same data type. The then_branch and else_branch may produce tensors with the same element type and different shapes. If corresponding outputs from the then-branch and the else-branch have static shapes S1 and S2, then the shape of the corresponding output variable of the if-node (if present) must be compatible with both S1 and S2 as it represents the union of both possible shapes.For example, if in a model file, the first output of then_branch is typed float tensor with shape [2] and the first output of else_branch is another float tensor with shape [3], If's first output should have (a) no shape set, or (b) a shape of rank 1 with neither dim_value nor dim_param set, or (c) a shape of rank 1 with a unique dim_param. In contrast, the first output cannot have the shape [2] since [2] and [3] are not compatible.

7r3nzy avatar Jun 06 '23 02:06 7r3nzy

Wow. Different shapes? ONNX spec is borderline crazy here. I can manage to support different shapes of the same rank with symbols, but supporting different ranks is an absolute no for tract.

Can you distill down the use-case for me for different ranks ? I can't even imagine how that could be useful. I would assume it's nearly immediately followed by a Reshape right ? Can we push this Reshape up into the branches ?

kali avatar Jun 06 '23 07:06 kali

It is immediately followed by Conv:

IF_69

Can we push this Reshape up into the branches ?

The problem is, I don't think silero-vad devs have mentioned/published their sources they used to train this model, so even if we change the model I'd be stuck on training it :(

Can you distill down the use-case for me for different ranks ?

My knowledge about the network is very limited, but I'll give it a try and get back to you if I find anything :)

7r3nzy avatar Jun 06 '23 07:06 7r3nzy

What kind of shapes is netron reporting ? I am surprised that the convolution can deal with the two axes configuration, as its weights have to be of the same rank than the input. Unless they altering the weight shapes too ?

We need to understand what they are doing, which operator collapse the two axes configuration into one. From there we can try and push back the change to unify the branches configuration inside the If. Do not worry too much yet about the sources and training, we can alter the model onnx file with a python script, probably just inserting a reshape/squeeze/unsqeeze in the right place to make the axes consistent, before handing it to tract.

kali avatar Jun 06 '23 07:06 kali

It may also be a tract bug while inferring shapes in one of the branches. Let's hope the Silero team can help us understanding what we are looking at.

kali avatar Jun 06 '23 08:06 kali

More details on the conv node:

conv

tract output: [loop] ┃┃┣┻ 57 If If_69 ┃┃┃ [loop|loop] ┏ 2 Source input_data1 ┃┃┃ [loop|loop] ┃ ━━━ ..,? ┃┃┃ [loop|loop] ┣┻ 1 Squeeze13 Squeeze_71 ┃┃┃ [loop|loop] ━━━ Squeeze175_dim_0,Squeeze175_dim_1,Squeeze175_dim_2,F32 ┃┃┃ [loop|else] ┏ 1 Source input_data1 ┃┃┃ [loop|else] ┃ ━━━ ..,? ┃┃┃ [loop|else] ┣ 0 Identity Identity_72 ┃┃┃ [loop|else] ━━━ Squeeze175_dim_0,Identity176_dim_1,Squeeze175_dim_1,Squeeze175_dim_2,F32 [loop] ┃┃┃┏ 134 Source model.feature_extractor.forward_basis_buffer [loop] ┃┃┃┃ ━━━ 258,1,256,F32 0, 0.00015059065, 0.0006022719, 0.0013547717, 0.0024076367, 0.0037602326, 0.005411745, 0.0073611788, 0.00960736, 0.012148935, 0.014984373, 0.018111967... [loop] ┃┃┣┻ 58 ConvHir Conv_73 [loop] ┃┃┣┓ [loop] ┃┃┃┣┻┻┻┻ 61 StridedSlice Slice_78 [loop] ┃┃┃┣┻ 62 Pow Pow_87 [loop] ┃┃┗┓ [loop] ┃┃┃┣┻┻┻┻ 59 StridedSlice Slice_84 [loop] ┃┃┃┣┻ 60 Pow Pow_89 [loop] ┃┃┣┻ 63 Add Add_90

These are just plain outputs without any analysis, I'll get back to you with more details.

7r3nzy avatar Jun 06 '23 08:06 7r3nzy

Guess dropping the "sonos" name in the issue at Silero generated some confusion and/or wrong expectations :) Not sure we're gonna get much help from there.

kali avatar Jun 06 '23 08:06 kali