Amir Zait
Results
1
comments of
Amir Zait
The problem is that stableLM uses the gptneoX with use_parallel_residual=True (so that each block is x + mlp(x) + attn(x)) and RedPajama uses use_parallel_residual=False. I implemented it in https://github.com/amirza1/ggml and...