Xiuying Wei
Results
2
issues of
Xiuying Wei
In Proposition 3.2, it's the MM* and (MM*)^2 represents convolution, Fourier transform, and other efficient linear transforms. However, it seems that in experiments, only M=PLP^TR matrix is being used, thus...
### ❓ The question Suppose that I need to do the mid-train over the 7B model, how can we enable the tensor parallelism with the current qk norm? Because currently...
type/question