1day_1paper [18] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers (SETR)

[18] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers (SETR)

Open dhkim0225 opened this issue 3 years ago • 0 comments

Transformer 로 segmentation 하자!

SETR

단순하다.

480x480x3 input, 16x16 grid로 나눔. 즉 각 patch 는 (30, 30, 3) 의 크기이다. 요 녀석을 Linear projection 시키고 encoder 24개를 태운다. 그림에서 (a) 에 해당한다. (~linear 대신 앞에 CNN 몇 개 쓰는게 더 좋았을 것 같은데~)

Decoder

naive upsampling (Naive)
1. 2 layer (1×1 conv + sync BN + ReLU + 1×1 conv) 를 거치면서 class 개수만큼 channel 늘려줌 (Cityscape ==> 19)
2. 단순히 binlinear upsample 을 통해 full resolution을 만들어 준다.
Progressive UPsampling (PUP)
1. 그림에서 (b)에 해당한다.
2. conv로 차례차례 이미지 resolution 올려줌
Multi-Level feature Aggregation (MLA)
1. 그림에서 (c)에 해당한다.
2. 6, 12, 18, 24 layer output 을 활용했음