ml-fastvit [Bug] Include last attention layer in feature output

[Bug] Include last attention layer in feature output

Open dillonalaird opened this issue 1 year ago • 0 comments

The last out indice should be 7 instead of 6, at least for the SA12 architecture. On SA12 if we return index 6 as the final layer it skips the last attention layer, while 7 includes it. The Timm implementation does include the final attention layer as output. I have trained both models for segmentation tasks on ADE20k using mmsegmentation with this configuration:

model = dict(
    type='EncoderDecoder',
    data_preprocessor=data_preprocessor,
    backbone=dict(
        type='FastViTSA12',
        pretrained=True,
    ),

    neck=dict(
        type='FPN',
        in_channels=[64, 128, 256, 512],
        out_channels=256,
        num_outs=4,
    ),

    decode_head=dict(
        type='FPNHead',
        in_channels=[256, 256, 256, 256],
        in_index=[0, 1, 2, 3],
        feature_strides=[4, 8, 16, 32],
        channels=128,
        dropout_ratio=0.1,
        num_classes=1,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss',
            use_sigmoid=False,
            loss_weight=1.0,
        ),
    ),
)

Some of the differences are:

Model	Parameters	ADE20k Val mIoU
Apple FastViT SA12 FPN	8.3M	30 mIoU
Timm FastViT SA12 FPN	14.6M	39 mIoU

Using the final attention layer the performance numbers and size line up much more closely to the papers reported numbers.

Nov 30 '23 19:11 dillonalaird

ml-fastvit ml-fastvit copied to clipboard

[Bug] Include last attention layer in feature output

ml-fastvit
ml-fastvit copied to clipboard