EVA
EVA copied to clipboard
Why no ViT-Adapter for semantic segmentation on ADE20K for EVA02?
Title says it all
We found using ViT-Adapter degenerate the performance on EVA-02
Very interesting, do you have any intuition why that may be?
@Yuxin-CV Hello, I have a question related to the application of EVA-02 for semantic segmentation. Since ViT-Adapter is not used, does this imply that all feature maps received by the task layer are at 1/16 of the original resolution? Similarly, is the output of the task layer (prior to final interpolation) also at 1/16 of the original resolution? Or is there any technique employed to obtain hierarchical feature maps from the backbone for semantic segmentation? I couldn't find explicit details in the EVA-02 paper. Thank you.
UPDATE: found the answer in code https://github.com/baaivision/EVA/blob/7389aeeec97c056fc8424fa6b78f35c6f1b07d0d/EVA-02/seg/backbone/eva2.py#L610-L623
@function2-llx Great, thank you for sharing!
I wonder if a LayerNorm would also work here or if it has to be a BatchNorm. Is there literature on this?