s4
s4 copied to clipboard
Paper, Table 1, Convolution number of parameters
Hi, a few things that are not fully clear to me on Table 1. It says convolution has LH parameters. How can it be if only the A matrix, which is learnable, is of shape LxL. Maybe it is because A is diagonalizable plus low rank, and we only learn the diagonal, and neglect the low rank?
- in 3.1, it says:
shouldn't the time complexity should O(N^3L)?
- In Table 1, why S4 number of parameters is H^2 and not LH? After all, section 3.4 says the number of parameters is L==N, and we need H dimensions, which makes it LH.