EVA Why no ViT-Adapter for semantic segmentation on ADE20K for EVA02?

Why no ViT-Adapter for semantic segmentation on ADE20K for EVA02?

Open tommiekerssies opened this issue 1 year ago • 5 comments

Title says it all

Jul 06 '23 15:07 tommiekerssies

We found using ViT-Adapter degenerate the performance on EVA-02

Jul 12 '23 10:07 Yuxin-CV

Very interesting, do you have any intuition why that may be?

Jul 14 '23 09:07 tommiekerssies

@Yuxin-CV Hello, I have a question related to the application of EVA-02 for semantic segmentation. Since ViT-Adapter is not used, does this imply that all feature maps received by the task layer are at 1/16 of the original resolution? Similarly, is the output of the task layer (prior to final interpolation) also at 1/16 of the original resolution? Or is there any technique employed to obtain hierarchical feature maps from the backbone for semantic segmentation? I couldn't find explicit details in the EVA-02 paper. Thank you.

UPDATE: found the answer in code https://github.com/baaivision/EVA/blob/7389aeeec97c056fc8424fa6b78f35c6f1b07d0d/EVA-02/seg/backbone/eva2.py#L610-L623

Jul 22 '23 04:07 function2-llx

@function2-llx Great, thank you for sharing!

Jul 24 '23 10:07 tommiekerssies

I wonder if a LayerNorm would also work here or if it has to be a BatchNorm. Is there literature on this?

Jul 24 '23 10:07 tommiekerssies

EVA EVA copied to clipboard

Why no ViT-Adapter for semantic segmentation on ADE20K for EVA02?

EVA
EVA copied to clipboard