TubeViT
TubeViT copied to clipboard
Number of Tokens different than papers
The number of tokens in the paper is 559 tokens (ch4.1), but the number of tokens in my implementation is 539.
- 8 x 8 x 8 with a stride of (16, 32, 32)
- 16 x 4 x 4 with a stride of 6 x 32 x 32 and an offset of (4, 8, 8)
- 4 x 12 x 12 with a stride of 16 x 32 x 32) and an offset of (0, 16, 16)
- 1 x 16 x 16 with a stride of (32, 16, 16).
For an input of 32 x 224 x 224, this results in only 559 tokens
The number of tokens in implementation
- 8 x 8 x 8 with a stride of (16, 32, 32) -> 98
- 16 x 4 x 4 with a stride of 6 x 32 x 32 and an offset of (4, 8, 8) -> 147
- 4 x 12 x 12 with a stride of 16 x 32 x 32) and an offset of (0, 16, 16) -> 98
- 1 x 16 x 16 with a stride of (32, 16, 16) -> 196
The total of tokens is 98+147+98+196 = 539