openvino
openvino copied to clipboard
[Good First Issue]: Support aten::avg_poolNd and aten::max_poolNd with no batch
Context
OpenVINO component responsible for support of PyTorch models is called as PyTorch Frontend (PT FE). PT FE converts a model represented as TorchScript model to a model in OpenVINO opset.
What needs to be done?
- Modify existing conversion rule for the pooling operations: https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/avg_poolnd.cpp and https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/max_poolnd.cpp.
- Openvino MaxPool and AvgPool operations assume that Batch and Channels dimensions exist in input. In order to support pytorch operations inputs that have no batch should be reshaped to have batch 1 and reshaped back after.
- Implement operation tests in tests/layer_tests/pytorch_tests. Please consider different data types, but keep reasonable number of test cases
Example Pull Requests
No response
Resources
- torch.max_pool2d description
- Contribution guide - start here!
- What is OpenVINO?
- OpenVINO PyTorch Frontend
- Blog post on contributing to OpenVINO
- User documentation
Contact points
@openvinotoolkit/openvino-pytorch-frontend-maintainers
Ticket
No response
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
Hello @ahmadchalhoub! Thank you for taking a look, please let us know if you have any questions. Just yesterday our CONTRIBUTING.md has been updated with a technical guide - I highly recommend checking it out. :)
@p-wysocki Thanks for your input! I'll definitely take a look at the updates :). I already started working on the PR for this but it might take me a few days.
Hello @ahmadchalhoub, do you need any help? Are you still working on that issue?
I am happy to announce that we have created a channel dedicated to Good First Issues support on our Intel DevHub Discord server! Join it to receive support, engage in discussions, ask questions and talk to OpenVINO developers.
Hi @p-wysocki! Very sorry for the delay; I got caught up with some things and will not be able to work on this issue. I will un-assign myself so someone else can pick it up :)
No worries, you're welcome to pick up another issue anytime. :)
@p-wysocki I am new to open-source, could you please provide some guidance relevant to above issue? I want to get started, I think task is just to reshape the input twice, but need some code understanding...
@mmikolajcz could you help here?
Hello @Sar2580P, you can start with our technical guide in CONTRIBUTING.md :)
You're also welcome to join our Intel DevHub Discord server, where you can receive chat support from OpenVINO developers.
@p-wysocki I am new to open-source, could you please provide some guidance relevant to above issue? I want to get started, I think task is just to reshape the input twice, but need some code understanding...
That is exactly right, you need to reshape input 2 times, but the main problem is to calculate the shape to reshape to, when input shape is only know in runtime.
I am very sorry, but do you know the deadline for this bug?
.take
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
Hi. I am new to Open source contributing but I am excited to start as I am too. I am going to participate in GSOC 24 too and choosing OpenVino organization as well. Currently I'm pursuing a Bachelor's degree in CSE AI and Machine Learning and have also some experience in Deep learning and neural networks as well.
The problem here is that we need a batch size initialization and channel here refers to the image components of an image. E.g RGB, CMY, Grayscale etc. Here in CNN it typically means the depth dimension of the input tensor.
For OpenVino's MaxPool and AvgPool operations it assumes batch and channels dimensions exist. But for images that don't have it in input need to be assigned batch as 1 and then after operation get back to its original size.
In the above code given we can try making a block of code which can check whether the image has a batch or not and if not then we assign batch as 1.
I am writing a code here since i dont know if there is a seperate section for providing codes as solutions And if any mentor would help me in how to use pull requests and how to know if you solve an issue, it would be really helpful :)
In this repo, we can try the following code: (This is my first open source contribution please be kind :) ) https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/avg_poolnd.cpp
OutputVector translate_avg_poolnd(const NodeContext& context) { // Same code as above link onlu
auto input = context.get_input(0);
// Here we check if the image tensor has a batch dimension. We check if the rank (number of dimensions) of the input tensor is known to us at compile time or not. and we then check its length whether it is less than 3 or not
if (input.get_partial_shape().rank().is_static() && input.get_partial_shape().rank().get_length() < 3) {
// now we add a batch size of 1
auto batch_size = context.mark_node(v0::Constant::create(element::i32, Shape{}, {1}));
input = context.mark_node(std::make_shared<v1::Broadcast>(input, batch_size));
}
// here rest of the code is same as the github repo link
return {context.mark_node(
std::make_shared<v1::AvgPool>(input, strides, pads, pads, kernel, !count_include_pad, rounding_type))};
}
Please feel free to guide me more and notify me if anything comes up.
@krish1209 Please check contribution guide for tips how to contribute and create Pull Requests.
The code you provided will not actually work as you expect, because Broadcast
accepts shape to broadcast to, when you provide 1 you ask to broacast tensor to scalar, which should fail. I would recommend you use Unsqueeze
operation, but the main difficulty in this task is the fact that pooling can be done for any shape NCH, CH, NCHW, CHW, NCHWD, CHWD, you may want to use information about which torch operation is used, e.g aten::max_pool2d or aten::max_pool1d, which will help you understanding what dimensions are required to get pooled.
Also please write a test for all these cases first and make sure your solution passes all tests.
If you have any questions please feel free to ask.
Okay, I'll spin my head around it. Meanwhile could you let me know like how are you thinking of approaching this ?
As of now, I am not aware if how to write a test for these cases but I'll learn how.
Is it possible we could make cases with the if else statements for two operations? like aten::max_pool2d and aten::max_pool1d?
For example:
if (context.get_op_type() == "aten::max_pool2d") { unsqueezed_input = context.mark_node(v1::Unsqueeze(unsqueezed_input, Shape{3})); } else (context.get_op_type() == "aten::max_pool1d") { unsqueezed_input = context.mark_node(v1::Unsqueeze(unsqueezed_input, Shape{2})); }
Similarly for 3d we get shape{4} for 4d, we get shape{5}
Also could you please provide with the link to discord, the previous link is not working.
Please let me know the status of the problem and if there is suggested or recommended issues I can get my hands on. Thank you so much
Thank you for your patience, and sorry for the late reply.
Is it possible we could make cases with the if else statements for two operations? like aten::max_pool2d and aten::max_pool1d?
@mvafin could you take a look?
Also could you please provide with the link to discord
The link has been updated in the CONTRIBUTING.md
- Intel DevHub Discord server.
Hello @krish1209, are you still willing to contribute?
@mvafin could you please take a look at the latest question if @krish1209 is still here?
Yes yes I am willing to contribute. Please let me know
@krish1209 Yes, but instead of if-else it is better to create a common function and functions that will call it with number of dimensions. For example this is done for adaptive_pool https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/adaptive_poolnd.cpp
I'll look into it, Got a little busy with my college projects. Sorry Will get back to you!
Based on the code written for adaptive_pooling, I have written the same common functions for max_pool and avg_pool operations. I will be providing the code with a little explanation and if it is seen fit, Please allow me to create a pull request based on it as it would also help me towards GSOC :). So this is the code, please do check it and guide me if it is wrong or can be done better.
// This is the common function for defining the max pool operation OutputVector translate_max_pool_base(const NodeContext& context, const Output<Node>& input, const Shape& kernel, const Strides& strides = Strides{}, const Shape& pads = Shape{}, const Strides& dilations = Strides{}, RoundingType rounding_type = RoundingType::FLOOR);
// For 1D Max Pooling OutputVector translate_max_pool1d(const NodeContext& context) { const auto kernel_shape = Shape{context.get_single_value<int64_t>(context.get_input_shape(1))}; return translate_max_pool_base(context, context.get_input(0), kernel_shape); }
// For 2D Max Pooling OutputVector translate_max_pool2d(const NodeContext& context) { const auto kernel_h = context.get_single_value<int64_t>(context.get_input_shape(1)); const auto kernel_w = context.get_single_value<int64_t>(context.get_input_shape(2)); const auto kernel_shape = Shape{kernel_h, kernel_w}; // Optional arguments (strides, pads, dilations) can be handled here const Strides strides = context.get_input_size() > 5 ? context.get_single_value<std::vector<int64_t>>(context.get_input_shape(5)) : Strides{}; const Shape pads = context.get_input_size() > 3 ? context.get_single_value<std::vector<int64_t>>(context.get_input_shape(3)) : Shape{}; const Strides dilations = context.get_input_size() > 4 ? context.get_single_value<std::vector<int64_t>>(context.get_input_shape(4)) : Strides{}; return translate_max_pool_base(context, context.get_input(0), kernel_shape, strides, pads, dilations); }
// For 3D Max Pooling (similar to 2D but with extra dimension) OutputVector translate_max_pool3d(const NodeContext& context) { const auto kernel_d = context.get_single_value<int64_t>(context.get_input_shape(1)); const auto kernel_h = context.get_single_value<int64_t>(context.get_input_shape(2)); const auto kernel_w = context.get_single_value<int64_t>(context.get_input_shape(3)); const auto kernel_shape = Shape{kernel_d, kernel_h, kernel_w}; return translate_max_pool_base(context, context.get_input(0), kernel_shape); }
@krish1209 You seem on the right track, but there are some issues you need to pay attension to:
- please note that there is no
context.get_single_value
in PyTorchpytorch::NodeContext
, I think you want to usecontext.const_input
. - you should avoid using
context.get_input_shape
because it is not guaranteed to return a static shape, for example for scripting case it will return fully dynamic shape with dynamic rank. - the approach that was used for adaptive pooling wasn't good and it will not work here, since you need to reshape input first CHW->1CHW and after pooling remove that extra dimension you inserted. I would recommend you pass number of pooling dimensions to the
base
function and do Reshape to the shape [-1,0,0,0] (pool2d case) with special zero (0 means to not change dimension). To reshape back you can use Slice to extractshape_before_pool[:-2]
andshape_after_pool[-2:]
(2d case) and concat those two shapes to create reshape after.
So i tried for this, considering the scripting case as well as needing to reshape input first and again removing the extra dimension we inserted.
Translate_max_pool_base function to include an additional parameter 'pooling_dims'. This parameter will indicate the number of pooling dimensions, allowing for flexibility in specifying whether the pooling operation is 1D, 2D, or 3D.
Then to address the problem in the previous code that is: if the input tensor's dimensions do not correspond to the pooling dimensions (for e.g adding a batch dimension), so for that pooling dimension is explicitly passed as an argument to the function translate_max_pool_base
So this is a glimpse of how it would look like
OutputVector translate_max_pool_base(const NodeContext& context, const Output& input,
const Shape& kernel, const Strides& strides = Strides{},
const Shape& pads = Shape{}, const Strides& dilations = Strides{},
RoundingType rounding_type = RoundingType::FLOOR,
int pooling_dims = 2); // Added parameter for pooling dimensions
OutputVector translate_max_pool1d(const NodeContext& context) {
const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
return translate_max_pool_base(context, context.const_input(0), kernel_shape, {}, {}, {}, RoundingType::FLOOR, 1); // Pooling dimension set for 1D
}
OutputVector translate_max_pool2d(const NodeContext& context) {
const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
const auto input_size = context.const_input(0).sizes();
const int64_t num_dims = input_size.size();
Strides strides, pads, dilations;
if (num_dims > 3) {
strides = Strides{context.const_input(5).to_vector<int64_t>()};
pads = Strides{context.const_input(3).to_vector<int64_t>()};
dilations = Strides{context.const_input(4).to_vector<int64_t>()};
}
return translate_max_pool_base(context, context.const_input(0), kernel_shape, strides, pads, dilations, RoundingType::FLOOR, 2); // Pooling dimension set for 2D
}
OutputVector translate_max_pool3d(const NodeContext& context) {
const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
return translate_max_pool_base(context, context.const_input(0), kernel_shape, {}, {}, {}, RoundingType::FLOOR, 3); // Pooling dimension set for 3D
}
One thing is that it assumes that kernel shape is provided as a vector of int64_t values.
Please guide me further as I am learning more and more slowly :)
@krish1209 That looks okay, the only thing I don't understand why you do an extra step for 2d case when num_dims > 3
? And main difficulty will be implementing the base
function.
It is to cover for the case if the pooling operation is 2d (channels, height, width), additional dimensions might be present such as batch dimensions. The extra step can handle the optional arguments for strides, pads, and dilations.
To correctly handle the optional arguments for the pooling operation, the above code checks if num_dims is greater than 3 and it will extract the values for strides, pads, and dilations from the context's input tensors accordingly. This will make sure pooling operation is performed correctly even if the input tensor has additional dimensions beyond the expected 2D shape.
I will then proceed to implement the base
function. If this is fine.
Hello guys, I'm facing this issue and looking forward to fixing it. My MaxPool3D layer is causing the issue:
nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 1, 1))
Are there any ways to escape this issue instead of waiting for new updates?