mlp-mixer-pytorch
mlp-mixer-pytorch copied to clipboard
I want to know why initialize dim,token_dim and channel_dim using 512,256,2048?
One reason could be Computer/GPU algorithms in general are optimized for values in powers of 2 because that way we can utilize the memory access in the most efficient manner.