1day_1paper
1day_1paper copied to clipboard

Published 20 hours ago •

Reame
Issues

[41] PVTv2: Improved Baselines with Pyramid Vision Transformer

Open dhkim0225 opened this issue 3 years ago • 0 comments

PVT (#68) 의 후속작. 6장짜리 간단한 논문

3가지 추가.

overlapping patch embedding
convolutional feedforward networks
linear complexity attention layers

빠르게 본론 ㄱㄱ

Improved Pyramid Vision Transformer

Limitations in PVTv1

Swin 같이 overlapping 하는게 좋은데 안하고 있었음
positional encoding 크기가 fix 되어 있어서 arbitrary size inference 가 힘듦 (이건 swin 도 v2 (#59) 에서 풀어냈었음)
high resolution input 넣으면 complexity 결국에는 커짐

Overlapping Patch Embedding

swin 과는 다른 방식. zero-padding 추가로 overlapping 구현 position encoding은 완전히 없애 버렸고, zero-padding 이 이를 대체한다고 말함.

Convolutional Feed-Forward

3x3 depthwise conv 추가

Linear Spatial Reduction Attention.

ㅋㅋㅋ 난 또 attention linear 하게 만들었다는 줄. 그냥 conv 하나 태우던거 pooling으로 교체.

Models

Results

Image classification

Object Detection

Instance Segmentation

coco val 2017

Dec 15 '21 04:12 dhkim0225