Amir Zait

Results 1 comments of Amir Zait

The problem is that stableLM uses the gptneoX with use_parallel_residual=True (so that each block is x + mlp(x) + attn(x)) and RedPajama uses use_parallel_residual=False. I implemented it in https://github.com/amirza1/ggml and...