openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[Good First Issue]: Support aten::avg_poolNd and aten::max_poolNd with no batch

Open mvafin opened this issue 1 year ago • 24 comments

Context

OpenVINO component responsible for support of PyTorch models is called as PyTorch Frontend (PT FE). PT FE converts a model represented as TorchScript model to a model in OpenVINO opset.

What needs to be done?

  • Modify existing conversion rule for the pooling operations: https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/avg_poolnd.cpp and https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/max_poolnd.cpp.
  • Openvino MaxPool and AvgPool operations assume that Batch and Channels dimensions exist in input. In order to support pytorch operations inputs that have no batch should be reshaped to have batch 1 and reshaped back after.
  • Implement operation tests in tests/layer_tests/pytorch_tests. Please consider different data types, but keep reasonable number of test cases

Example Pull Requests

No response

Resources

Contact points

@openvinotoolkit/openvino-pytorch-frontend-maintainers

Ticket

No response

mvafin avatar Nov 07 '23 11:11 mvafin

.take

ahmadchalhoub avatar Dec 13 '23 05:12 ahmadchalhoub

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar Dec 13 '23 05:12 github-actions[bot]

Hello @ahmadchalhoub! Thank you for taking a look, please let us know if you have any questions. Just yesterday our CONTRIBUTING.md has been updated with a technical guide - I highly recommend checking it out. :)

p-wysocki avatar Dec 15 '23 10:12 p-wysocki

@p-wysocki Thanks for your input! I'll definitely take a look at the updates :). I already started working on the PR for this but it might take me a few days.

ahmadchalhoub avatar Dec 15 '23 16:12 ahmadchalhoub

Hello @ahmadchalhoub, do you need any help? Are you still working on that issue?

I am happy to announce that we have created a channel dedicated to Good First Issues support on our Intel DevHub Discord server! Join it to receive support, engage in discussions, ask questions and talk to OpenVINO developers.

p-wysocki avatar Jan 03 '24 15:01 p-wysocki

Hi @p-wysocki! Very sorry for the delay; I got caught up with some things and will not be able to work on this issue. I will un-assign myself so someone else can pick it up :)

ahmadchalhoub avatar Jan 05 '24 00:01 ahmadchalhoub

No worries, you're welcome to pick up another issue anytime. :)

p-wysocki avatar Jan 05 '24 10:01 p-wysocki

@p-wysocki I am new to open-source, could you please provide some guidance relevant to above issue? I want to get started, I think task is just to reshape the input twice, but need some code understanding...

Sar2580P avatar Jan 07 '24 19:01 Sar2580P

@mmikolajcz could you help here?

mlukasze avatar Jan 08 '24 06:01 mlukasze

Hello @Sar2580P, you can start with our technical guide in CONTRIBUTING.md :)

You're also welcome to join our Intel DevHub Discord server, where you can receive chat support from OpenVINO developers.

p-wysocki avatar Jan 08 '24 12:01 p-wysocki

@p-wysocki I am new to open-source, could you please provide some guidance relevant to above issue? I want to get started, I think task is just to reshape the input twice, but need some code understanding...

That is exactly right, you need to reshape input 2 times, but the main problem is to calculate the shape to reshape to, when input shape is only know in runtime.

mvafin avatar Jan 08 '24 12:01 mvafin

I am very sorry, but do you know the deadline for this bug?

SpaceDigi avatar Jan 12 '24 16:01 SpaceDigi

.take

krish1209 avatar Jan 16 '24 10:01 krish1209

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

github-actions[bot] avatar Jan 16 '24 10:01 github-actions[bot]

Hi. I am new to Open source contributing but I am excited to start as I am too. I am going to participate in GSOC 24 too and choosing OpenVino organization as well. Currently I'm pursuing a Bachelor's degree in CSE AI and Machine Learning and have also some experience in Deep learning and neural networks as well.

The problem here is that we need a batch size initialization and channel here refers to the image components of an image. E.g RGB, CMY, Grayscale etc. Here in CNN it typically means the depth dimension of the input tensor.

For OpenVino's MaxPool and AvgPool operations it assumes batch and channels dimensions exist. But for images that don't have it in input need to be assigned batch as 1 and then after operation get back to its original size.

In the above code given we can try making a block of code which can check whether the image has a batch or not and if not then we assign batch as 1.

I am writing a code here since i dont know if there is a seperate section for providing codes as solutions And if any mentor would help me in how to use pull requests and how to know if you solve an issue, it would be really helpful :)

In this repo, we can try the following code: (This is my first open source contribution please be kind :) ) https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/avg_poolnd.cpp

OutputVector translate_avg_poolnd(const NodeContext& context) { // Same code as above link onlu

auto input = context.get_input(0);

// Here we check if the image tensor has a batch dimension. We check if the rank (number of dimensions) of the input tensor is known to us at compile time or not. and we then check its length whether it is less than 3 or not
if (input.get_partial_shape().rank().is_static() && input.get_partial_shape().rank().get_length() < 3) {
    // now we add a batch size of 1
    auto batch_size = context.mark_node(v0::Constant::create(element::i32, Shape{}, {1}));
    input = context.mark_node(std::make_shared<v1::Broadcast>(input, batch_size));
}

// here rest of the code is same as the github repo link

return {context.mark_node(
    std::make_shared<v1::AvgPool>(input, strides, pads, pads, kernel, !count_include_pad, rounding_type))};

}

Please feel free to guide me more and notify me if anything comes up.

krish1209 avatar Jan 16 '24 10:01 krish1209

@krish1209 Please check contribution guide for tips how to contribute and create Pull Requests.

The code you provided will not actually work as you expect, because Broadcast accepts shape to broadcast to, when you provide 1 you ask to broacast tensor to scalar, which should fail. I would recommend you use Unsqueeze operation, but the main difficulty in this task is the fact that pooling can be done for any shape NCH, CH, NCHW, CHW, NCHWD, CHWD, you may want to use information about which torch operation is used, e.g aten::max_pool2d or aten::max_pool1d, which will help you understanding what dimensions are required to get pooled.

Also please write a test for all these cases first and make sure your solution passes all tests.

If you have any questions please feel free to ask.

mvafin avatar Jan 16 '24 11:01 mvafin

Okay, I'll spin my head around it. Meanwhile could you let me know like how are you thinking of approaching this ?

krish1209 avatar Jan 16 '24 11:01 krish1209

As of now, I am not aware if how to write a test for these cases but I'll learn how.

Is it possible we could make cases with the if else statements for two operations? like aten::max_pool2d and aten::max_pool1d?

For example:

if (context.get_op_type() == "aten::max_pool2d") { unsqueezed_input = context.mark_node(v1::Unsqueeze(unsqueezed_input, Shape{3})); } else (context.get_op_type() == "aten::max_pool1d") { unsqueezed_input = context.mark_node(v1::Unsqueeze(unsqueezed_input, Shape{2})); }

Similarly for 3d we get shape{4} for 4d, we get shape{5}

Also could you please provide with the link to discord, the previous link is not working.

krish1209 avatar Jan 16 '24 11:01 krish1209

Please let me know the status of the problem and if there is suggested or recommended issues I can get my hands on. Thank you so much

krish1209 avatar Jan 20 '24 16:01 krish1209

Thank you for your patience, and sorry for the late reply.

Is it possible we could make cases with the if else statements for two operations? like aten::max_pool2d and aten::max_pool1d?

@mvafin could you take a look?

Also could you please provide with the link to discord

The link has been updated in the CONTRIBUTING.md - Intel DevHub Discord server.

p-wysocki avatar Jan 22 '24 13:01 p-wysocki

Hello @krish1209, are you still willing to contribute?

@mvafin could you please take a look at the latest question if @krish1209 is still here?

p-wysocki avatar Feb 19 '24 14:02 p-wysocki

Yes yes I am willing to contribute. Please let me know

krish1209 avatar Feb 19 '24 14:02 krish1209

@krish1209 Yes, but instead of if-else it is better to create a common function and functions that will call it with number of dimensions. For example this is done for adaptive_pool https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/adaptive_poolnd.cpp

mvafin avatar Feb 20 '24 15:02 mvafin

I'll look into it, Got a little busy with my college projects. Sorry Will get back to you!

krish1209 avatar Feb 23 '24 17:02 krish1209

Based on the code written for adaptive_pooling, I have written the same common functions for max_pool and avg_pool operations. I will be providing the code with a little explanation and if it is seen fit, Please allow me to create a pull request based on it as it would also help me towards GSOC :). So this is the code, please do check it and guide me if it is wrong or can be done better.

// This is the common function for defining the max pool operation OutputVector translate_max_pool_base(const NodeContext& context, const Output<Node>& input, const Shape& kernel, const Strides& strides = Strides{}, const Shape& pads = Shape{}, const Strides& dilations = Strides{}, RoundingType rounding_type = RoundingType::FLOOR);

// For 1D Max Pooling OutputVector translate_max_pool1d(const NodeContext& context) { const auto kernel_shape = Shape{context.get_single_value<int64_t>(context.get_input_shape(1))}; return translate_max_pool_base(context, context.get_input(0), kernel_shape); }

// For 2D Max Pooling OutputVector translate_max_pool2d(const NodeContext& context) { const auto kernel_h = context.get_single_value<int64_t>(context.get_input_shape(1)); const auto kernel_w = context.get_single_value<int64_t>(context.get_input_shape(2)); const auto kernel_shape = Shape{kernel_h, kernel_w}; // Optional arguments (strides, pads, dilations) can be handled here const Strides strides = context.get_input_size() > 5 ? context.get_single_value<std::vector<int64_t>>(context.get_input_shape(5)) : Strides{}; const Shape pads = context.get_input_size() > 3 ? context.get_single_value<std::vector<int64_t>>(context.get_input_shape(3)) : Shape{}; const Strides dilations = context.get_input_size() > 4 ? context.get_single_value<std::vector<int64_t>>(context.get_input_shape(4)) : Strides{}; return translate_max_pool_base(context, context.get_input(0), kernel_shape, strides, pads, dilations); }

// For 3D Max Pooling (similar to 2D but with extra dimension) OutputVector translate_max_pool3d(const NodeContext& context) { const auto kernel_d = context.get_single_value<int64_t>(context.get_input_shape(1)); const auto kernel_h = context.get_single_value<int64_t>(context.get_input_shape(2)); const auto kernel_w = context.get_single_value<int64_t>(context.get_input_shape(3)); const auto kernel_shape = Shape{kernel_d, kernel_h, kernel_w}; return translate_max_pool_base(context, context.get_input(0), kernel_shape); }

krish1209 avatar Mar 08 '24 07:03 krish1209

@krish1209 You seem on the right track, but there are some issues you need to pay attension to:

  • please note that there is no context.get_single_value in PyTorch pytorch::NodeContext, I think you want to use context.const_input.
  • you should avoid using context.get_input_shape because it is not guaranteed to return a static shape, for example for scripting case it will return fully dynamic shape with dynamic rank.
  • the approach that was used for adaptive pooling wasn't good and it will not work here, since you need to reshape input first CHW->1CHW and after pooling remove that extra dimension you inserted. I would recommend you pass number of pooling dimensions to the base function and do Reshape to the shape [-1,0,0,0] (pool2d case) with special zero (0 means to not change dimension). To reshape back you can use Slice to extract shape_before_pool[:-2] and shape_after_pool[-2:] (2d case) and concat those two shapes to create reshape after.

mvafin avatar Mar 08 '24 09:03 mvafin

So i tried for this, considering the scripting case as well as needing to reshape input first and again removing the extra dimension we inserted. Translate_max_pool_base function to include an additional parameter 'pooling_dims'. This parameter will indicate the number of pooling dimensions, allowing for flexibility in specifying whether the pooling operation is 1D, 2D, or 3D. Then to address the problem in the previous code that is: if the input tensor's dimensions do not correspond to the pooling dimensions (for e.g adding a batch dimension), so for that pooling dimension is explicitly passed as an argument to the function translate_max_pool_base So this is a glimpse of how it would look like

OutputVector translate_max_pool_base(const NodeContext& context, const Output& input,
                                     const Shape& kernel, const Strides& strides = Strides{},
                                     const Shape& pads = Shape{}, const Strides& dilations = Strides{},
                                     RoundingType rounding_type = RoundingType::FLOOR,
                                     int pooling_dims = 2);  // Added parameter for pooling dimensions

OutputVector translate_max_pool1d(const NodeContext& context) {
    const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
    return translate_max_pool_base(context, context.const_input(0), kernel_shape, {}, {}, {}, RoundingType::FLOOR, 1); // Pooling dimension set for 1D
}

OutputVector translate_max_pool2d(const NodeContext& context) {
    const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
    const auto input_size = context.const_input(0).sizes();
    const int64_t num_dims = input_size.size();
    
    Strides strides, pads, dilations;
    if (num_dims > 3) {
        strides = Strides{context.const_input(5).to_vector<int64_t>()};
        pads = Strides{context.const_input(3).to_vector<int64_t>()};
        dilations = Strides{context.const_input(4).to_vector<int64_t>()};
    }
    
    return translate_max_pool_base(context, context.const_input(0), kernel_shape, strides, pads, dilations, RoundingType::FLOOR, 2); // Pooling dimension set for 2D
}

OutputVector translate_max_pool3d(const NodeContext& context) {
    const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
    return translate_max_pool_base(context, context.const_input(0), kernel_shape, {}, {}, {}, RoundingType::FLOOR, 3); // Pooling dimension set for 3D
}

One thing is that it assumes that kernel shape is provided as a vector of int64_t values.

Please guide me further as I am learning more and more slowly :)

krish1209 avatar Mar 17 '24 18:03 krish1209

@krish1209 That looks okay, the only thing I don't understand why you do an extra step for 2d case when num_dims > 3? And main difficulty will be implementing the base function.

mvafin avatar Mar 18 '24 09:03 mvafin

It is to cover for the case if the pooling operation is 2d (channels, height, width), additional dimensions might be present such as batch dimensions. The extra step can handle the optional arguments for strides, pads, and dilations.

To correctly handle the optional arguments for the pooling operation, the above code checks if num_dims is greater than 3 and it will extract the values for strides, pads, and dilations from the context's input tensors accordingly. This will make sure pooling operation is performed correctly even if the input tensor has additional dimensions beyond the expected 2D shape.

I will then proceed to implement the base function. If this is fine.

krish1209 avatar Mar 18 '24 11:03 krish1209

Hello guys, I'm facing this issue and looking forward to fixing it. My MaxPool3D layer is causing the issue:

nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 1, 1))

Are there any ways to escape this issue instead of waiting for new updates?

cannguyen275 avatar Apr 15 '24 03:04 cannguyen275