burn icon indicating copy to clipboard operation
burn copied to clipboard

burn-import initializer tensor not added to scope

Open mosure opened this issue 1 year ago • 16 comments

Describe the bug nodes with input tensors of category initializer fail to import due to initializer tensor names not being in scope

To Reproduce import a minimal onnx graph /w initializer tensor, e.g.

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

    def forward(self, x):
        x = F.interpolate(x, size=(2, 2), mode='bilinear', align_corners=True)
        return x

image image

Expected behavior burn-import generated model contains tensor initializers, e.g. _Concat_output_0 is defined in the scope

Actual behavior burn import fails with _Concat_output_0 is not in scope, panic at this line: https://github.com/tracel-ai/burn/blob/71bd5efbfaf220b5058dbf4f963326e0b0d38527/crates/burn-import/src/burn/scope.rs#L53

Desktop (please complete the following information):

  • OS: [Windows]

Additional context this occurs in resize, gather, mul_scalar, and sub_scalar node imports as well (with initializer tensors)

mosure avatar Jun 12 '24 16:06 mosure

@skewballfox, do you think this is related to #1857?

antimora avatar Jun 12 '24 17:06 antimora

was this encountered with the version of burn last published in crates.io or on main?

also, I'm a bit confused. is /Concat_output_0 both a node output and an initializer?

skewballfox avatar Jun 12 '24 17:06 skewballfox

this is encountered on main.

here is a more minimal reproducible example which clarifies the /Concat_output_0 initializer/output ambiguity:

image

python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as an initializer
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_initializer = numpy_helper.from_array(sizes, name="sizes")

    resize_node = helper.make_node(
        "Resize",
        name="resize_node",
        inputs=["input_tensor", "", "", "sizes"],
        outputs=["output"],
        mode="linear",
    )

    graph_def = helper.make_graph(
        nodes=[resize_node],
        name="ResizeGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 2])
        ],
        initializer=[sizes_initializer],
    )

    model_def = helper.make_model(graph_def, producer_name="resize")

    onnx.save(model_def, "resize.onnx")


if __name__ == "__main__":
    main()

mosure avatar Jun 12 '24 17:06 mosure

the same scope import error happens with constant input tensors too, e.g.

image

python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as a constant node
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_const_node = helper.make_node(
        "Constant",
        name="sizes_const",
        inputs=[],
        outputs=["sizes"],
        value=numpy_helper.from_array(sizes)
    )

    resize_node = helper.make_node(
        "Resize",
        name="resize_node",
        inputs=["input_tensor", "", "", "sizes"],
        outputs=["output"],
        mode="linear",
    )

    graph_def = helper.make_graph(
        nodes=[sizes_const_node, resize_node],
        name="ResizeGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 2])
        ],
        initializer=[],
    )

    model_def = helper.make_model(graph_def, producer_name="resize")

    onnx.save(model_def, "resize.onnx")


if __name__ == "__main__":
    main()

mosure avatar Jun 12 '24 17:06 mosure

@antimora I think this might not be related to lifting intializers or constants here's some information from the debug print of resize.onnx prior to any editing:

model_proto: ModelProto {
...
                    NodeProto {
                        input: [
                            "input_tensor",
                            "",
                            "",
                            "sizes",
                        ],
                        output: [
                            "output",
                        ],
                        name: "resize_node",
...

sizes is in the list of initializers.

the two fields with "" are roi and scales. the two solutions that come to mind is create a dummy entry for empty strings, or rework the api for the functions that might have empty strings as inputs to create a default or None input arg.

skewballfox avatar Jun 12 '24 20:06 skewballfox

a minimal gather import with Initializer indices also has a scope panic. gather imports correctly with Constant indices.

I don't think gather has the same empty input names as resize: https://onnx.ai/onnx/operators/onnx__Gather.html

image

broken initializer gather python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/gather/gather.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the indices tensor as an initializer
    indices = np.array([0, 2], dtype=np.int64)
    indices_initializer = helper.make_tensor(
        name="indices",
        data_type=TensorProto.INT64,
        dims=indices.shape,
        vals=indices,
    )

    gather_node = helper.make_node(
        "Gather",
        name="gather_node",
        inputs=["input_tensor", "indices"],
        outputs=["output"],
        axis=2,  # Assuming we want to gather along the height dimension (axis 2)
    )

    graph_def = helper.make_graph(
        nodes=[gather_node],
        name="GatherGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 4])
        ],
        initializer=[indices_initializer],
    )

    model_def = helper.make_model(graph_def, producer_name="gather")

    onnx.save(model_def, "gather.onnx")


if __name__ == "__main__":
    main()

image

working constant gather python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/gather/gather.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the indices tensor as a constant node
    indices = np.array([0, 2], dtype=np.int64)
    indices_const_tensor = helper.make_tensor(
        name="indices_const_tensor",
        data_type=TensorProto.INT64,
        dims=indices.shape,
        vals=indices,
    )

    indices_const_node = helper.make_node(
        "Constant",
        name="indices_const_node",
        inputs=[],
        outputs=["indices"],
        value=indices_const_tensor,
    )

    gather_node = helper.make_node(
        "Gather",
        name="gather_node",
        inputs=["input_tensor", "indices"],
        outputs=["output"],
        axis=2,  # Assuming we want to gather along the height dimension (axis 2)
    )

    graph_def = helper.make_graph(
        nodes=[indices_const_node, gather_node],
        name="GatherGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 4])
        ],
        initializer=[],
    )

    model_def = helper.make_model(graph_def, producer_name="gather")

    onnx.save(model_def, "gather.onnx")


if __name__ == "__main__":
    main()

I'd be curious to know more about the solution for dummy entries of empty input names, what mechanism of burn_import is causing empty names to be ignored?

mosure avatar Jun 12 '24 23:06 mosure

I'd be curious to know more about the solution for dummy entries of empty input names, what mechanism of burn_import is causing empty names to be ignored?

When we initialize nodes from the proto data all input arguments (which are simply strings, the original names of the arguments) pass through this function. All that get stored in the inputs is the original graph inputs and the node outputs after it has been processed.

I was sort of wrong in my initial assumption: I assumed using an empty string for a HashMap key wouldn't work, but I just tried it on the rust playground and its allowed. I think we still need to check for empty string names or some placeholder to indicate to use a default value though.

Honestly I'm kind of stumped. I'm going to recreate the gather.onnx locally tomorrow.

skewballfox avatar Jun 13 '24 00:06 skewballfox

I thought the issue with the initializers might be that the original rename input function renamed arguments to empty string, so maybe there was some "if empty inline tensor" check happening, though when I tried that with size on resize.onnx, I ran into a different error: Tensor of Kind Int with dim shape None was passed with empty name .

whatever's happening only with initializers, but not other generated inputs, and not when those initializers are graph inputs.

skewballfox avatar Jun 13 '24 00:06 skewballfox

the same scope import error happens with constant input tensors too, e.g.

...

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as a constant node
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_const_node = helper.make_node(
        "Constant",
        name="sizes_const",
        inputs=[],
        outputs=["sizes"],
        value=numpy_helper.from_array(sizes)
    )
...

Looking at how that constant is defined, that will be lifted. So at the time scope is checked it will only exist as an input for resize.

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

skewballfox avatar Jun 13 '24 14:06 skewballfox

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@skewballfox, you must be referring to ONNX Constant. Netron does not show individual Constant node if it's referenced by a single Node. ONNX import supports constants (see code). The constant value is stored as an attribute of the model module struct. It's a Param without gradients.

antimora avatar Jun 13 '24 15:06 antimora

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@skewballfox, you must be referring to ONNX Constant. Netron does not show individual Constant node if it's referenced by a single Node. ONNX import supports constants (see code). The constant value is stored as an attribute of the model module struct. It's a Param without gradients.

This wouldn't be tied to a constant node because that constant is lifted, and the parent node is deleted.

I was more wondering about the input arguments to nodes that aren't graph inputs or node outputs, so lifted constants (where the original node is deleted like above), initializers, generated arguments.

The reason I ask is because this seems to be triggering when the argument only exist as an input, which is weird because scope is about determining ownership, but these args only exist in one place

skewballfox avatar Jun 13 '24 16:06 skewballfox

adding this line to consume seems to fix the initializer import: self.inputs.extend(self.initializers.values().cloned());

https://github.com/tracel-ai/burn/blob/71bd5efbfaf220b5058dbf4f963326e0b0d38527/crates/burn-import/src/onnx/from_onnx.rs#L163


OnnxGraph::inputs doesn't capture the initializers.

this needs to contain the initializer tensor names when passed to register_input_output: https://github.com/tracel-ai/burn/blob/71bd5efbfaf220b5058dbf4f963326e0b0d38527/crates/burn-import/src/onnx/to_burn.rs#L328


edit: nvm, this results in the initializer being a parameter to forward: pub fn forward(&self, input1: Tensor<B, 4>, indices: Tensor<B, 1, Int>) -> Tensor<B, 4>

the error seems near this pathway, e.g. node input initializers are not being managed properly by BurnGraph: https://github.com/tracel-ai/burn/blob/71bd5efbfaf220b5058dbf4f963326e0b0d38527/crates/burn-import/src/burn/graph.rs#L40

mosure avatar Jun 13 '24 17:06 mosure

if I convert the initializer to a constant and disable constant lifting, I run into an error where an Int constant tensor is added to the imported model, however, all of the Initializer::init* methods only support float tensors:

https://github.com/tracel-ai/burn/blob/71bd5efbfaf220b5058dbf4f963326e0b0d38527/crates/burn-core/src/nn/initializer.rs#L79

let constant7: burn::module::Param<Tensor<B, 1, Int>> = burn::nn::Initializer::Zeros
            .init([1], device)
            .set_require_grad(false);
mismatched types
expected struct `Param<Tensor<B, burn::tensor::Int, _>>`
   found struct `Param<Tensor<_, burn::tensor::Float, _>>`

mosure avatar Jun 13 '24 18:06 mosure

if I convert the initializer to a constant and disable constant lifting, I run into an error where an Int constant tensor is added to the imported model, however, all of the Initializer::init* methods only support float tensors:

https://github.com/tracel-ai/burn/blob/71bd5efbfaf220b5058dbf4f963326e0b0d38527/crates/burn-core/src/nn/initializer.rs#L79

let constant7: burn::module::Param<Tensor<B, 1, Int>> = burn::nn::Initializer::Zeros
            .init([1], device)
            .set_require_grad(false);
mismatched types
expected struct `Param<Tensor<B, burn::tensor::Int, _>>`
   found struct `Param<Tensor<_, burn::tensor::Float, _>>`

CC @nathanielsimard and @laggui . Should we make the initializer more generic for other tensors?

antimora avatar Jun 13 '24 20:06 antimora

here is some work towards generic initializers, it would need more careful handling: https://github.com/mosure/burn/blob/8122e8b674fcad727c439d8c235b59fc7d7062e1/crates/burn-core/src/nn/initializer.rs#L79

mosure avatar Jun 13 '24 22:06 mosure

so I've been looking at output for cargo -r -- graph_path ./out for both the broken graphs and the graphs that I do know work.

  • Something interesting about the HashMap variables being checked is that it's only being added to during register_use

  • the way register_future_use is being called only on inputs, after register_use is called on graph inputs and node outputs.

  • These functions are being filtered to only look at the inputs that are tensors, probably because rust allows implicit copying of basic numeric types.

  • This doesn't trigger on reshape, but that's because it looks like reshape doesn't store it's rhs as a tensor. it's just a vector:

Reshape(
    ReshapeNode {
        input: TensorType {
            name: Ident(
                unsqueeze1_out1,
            ),
            dim: 5,
            kind: Float,
            shape: None,
        },
        output: TensorType {
            name: Ident(
                unsqueeze2_out1,
            ),
            dim: 6,
            kind: Float,
            shape: None,
        },
        shape: [
            1,
            1,
            3,
            4,
            5,
            1,
        ],
    },
)

I suspect that this is going to panic anytime we have a initializer or lifted constant that's typed as a tensor by burn graph. so far all the situations where I know constants are lifted in an op covered by test cases, the lifted values weren't tensors by this point. If you guys can think of a test case that has a lifted tensor for an op that takes a tensor argument, please let me know.

skewballfox avatar Jun 14 '24 16:06 skewballfox