keras-cv icon indicating copy to clipboard operation
keras-cv copied to clipboard

AdaptiveAvgPool2D layer

Open Rocketknight1 opened this issue 2 years ago • 15 comments

Short Description At Hugging Face we've seen a few PyTorch vision transformer models using AdaptiveAvgPool2D. In a lot of cases these are just resizing to (1,) or (1, 1), in which case they're just a strange way to compute torch.mean(), but in some cases they're actually using this layer's full functionality.

The problem with AdaptiveAvgPool2D is that it computes the pooling windows in a unique way, and the size of the windows can be variable. This makes it impossible to implement with a standard pooling layer, and very annoying to port to TF, especially if you want to load weights from a model that was trained with the Torch layer. There is an implementation in tensorflow-addons but it uses fixed size windows, and so does not match the output of the Torch layer unless the input size is an integer multiple of the output size.

We made a reasonably performant TF version that does correctly match the Torch layer in all cases - do you need this in keras-cv? We'd be happy to add it as a PR if it's useful.

Existing Implementations Torch layer

Other information See this StackOverflow post for a good description of how the Torch layer works internally.

Rocketknight1 avatar Jun 22 '22 16:06 Rocketknight1

Really interesting, is this layer required to get strong performing in vision transformer models?

I see no issue in including it, but I want to be sure it fits a use case. Let me know if you have more information as to "why" it is used.

LukeWood avatar Jun 22 '22 18:06 LukeWood

I suspect that the layer is never really necessary when you're designing a model from scratch - you could always just use normal pooling layers and just choose the strides and widths appropriately. The need for it in TF mostly arises when someone has trained a model in PyTorch and you want to reimplement their model and load their weights - then if you write a slightly different pooling layer you'll probably break compatibility.

It's commonly used in pyramid pooling modules (paper, >8000 citations), e.g. in BeiT (>200 citations, code sample) and Data2Vec. There is also a tensorflow-addons port of that module, but as it depends on the TFA implementation of adaptive pooling, results do not match Torch pyramid pooling modules.

Rocketknight1 avatar Jun 22 '22 19:06 Rocketknight1

That said, it's totally okay if you want to leave it out for now - I linked it in the gist above, so feel free to just close this for now and add the layer later if/when you find out you need it for a model.

Rocketknight1 avatar Jun 22 '22 19:06 Rocketknight1

That said, it's totally okay if you want to leave it out for now - I linked it in the gist above, so feel free to just close this for now and add the layer later if/when you find out you need it for a model.

I think this sounds like a good addition, I'm mainly just curious if the only reason anyone uses it is backwards compatibility. Do people continue to use it a lot in the pytorch world today?

LukeWood avatar Jun 22 '22 19:06 LukeWood

Do I understand correctly that this layer at train/inference always uses the same size strides/width, but that this layer is a way of auto-selecting these values?

LukeWood avatar Jun 22 '22 19:06 LukeWood

Yes, that's correct. However, it doesn't exactly have a 'stride' in the usual sense. It basically splats (potentially overlapping) pooling windows all across the input so as to get the desired output shape, but the spacing between these windows is usually not constant, unless the input size is an integer multiple of the output size. The width of the windows can also vary in different locations.

Rocketknight1 avatar Jun 22 '22 20:06 Rocketknight1

I see, can you paste a snipped of the API we would want to use? Do you specify output shape?

LukeWood avatar Jun 22 '22 20:06 LukeWood

Sure, the API is just:

layer = AdaptiveAvgPool2D(output_dims=(128, 128))

# Can also support NCHW, but we use NHWC here
inputs = tf.ones((8, 192, 192, 3), dtype=tf.float32)
outputs = layer(inputs)

# outputs.shape is (8, 128, 128, 3)

In other words, you specify the output shape at init, and then whatever Tensor you pass in gets pooled down to the desired output shape, with the required pooling windows being calculated in the call(). The same layer can handle multiple different input shapes without needing to be rebuilt.

Rocketknight1 avatar Jun 22 '22 21:06 Rocketknight1

Gotcha, that is a pretty interesting feature.

One last quick Q, does it handle upscaling too?

layer = AdaptiveAvgPool2D(output_dims=(128, 128))
# Can also support NCHW, but we use NHWC here
inputs = tf.ones((8, 192, 192, 3), dtype=tf.float32)
outputs = layer(inputs)
# outputs.shape is (8, 128, 128, 3)

upscaled =  AdaptiveAvgPool2D(output_dims=(256, 256))(outputs)
# upscaled is (8, 256, 256, 3)

LukeWood avatar Jun 22 '22 21:06 LukeWood

In Torch it might (though I don't think this is a common/intended use case), but because I implemented it using normal pooling layers in TF I don't think it would work in my implementation. If needed, I could add a different code path for upscaling, but I haven't seen any code in the wild where people use it for that.

Rocketknight1 avatar Jun 23 '22 11:06 Rocketknight1

for sure, thanks.

My only real concern here is that this isn't idiomatic to Keras. In pytorch you specify output features all the time, but in Keras that is computed for you. So it is a little strange to have this in Keras, BUT I will say for compatibility purposes it could be valuable.

LukeWood avatar Jun 23 '22 17:06 LukeWood

Yeah, I'd say it's mostly (entirely?) useful for PyTorch model compatibility, so I get that it might feel out of place. But still, let us know if you want it anyway, and we'll make a PR!

Rocketknight1 avatar Jun 23 '22 17:06 Rocketknight1

thanks for the offer!

I'd like to hear @fchollet and @tanzhenyu 's opinion on compatibility layers like this. Our long term goal is to not port weights, but perhaps this is low enough cost that the benefit outweights the cost.

LukeWood avatar Jun 23 '22 17:06 LukeWood

@Rocketknight1 Do you also implement 1D and 3D version of this layer?

innat avatar Jul 21 '22 11:07 innat

Hi @innat I could, yes! If you look at the gist I linked above, the pseudo_1d_pool function is basically just a 1D AdaptivePool, so that would be very easy to implement as a separate layer. To do a 3D pool I would just do a 1D pool on each of the 3 dimensions.

Rocketknight1 avatar Jul 21 '22 16:07 Rocketknight1

Closing this until we have a strong use case!

LukeWood avatar May 04 '23 18:05 LukeWood