A-ViT
A-ViT copied to clipboard
A question about the halting score distribution code
In the paper, the halting score distribution is defined as below:

However, the corresponding code seems wrong. https://github.com/NVlabs/A-ViT/blob/120c9cb90acf86828f1c61dd42c08722aa7173c7/timm/models/act_vision_transformer.py#L464-L465
The shape of h_lst[1] is [B, N], so the code seems to average on the whole batch and ignores the first sample of each batch.
I think the right code is:
self.halting_score_layer.append(torch.mean(h_lst[1][:, 1:], dim=-1))
Can you tell me which one is correct? Thanks!
I have the same question