gpt-neox
gpt-neox copied to clipboard
Add `intermediate_size` to GPT-NeoX models
The current implementation only allows to set intermediate_size
for Llama models, but I would like to be capable to change the intermediate_size
in GPT-NeoX models.
I have tested this implementation under a quick training, inference and conversion and it seems that it doesn't give any bugs. I hope it helps!
This looks good. Need to make it consistent with mamba and RWKV. We also need some TODO statements about revisiting this once we add swiglu.
@jahatef is on it
Added support for mamba and RWKV, and added TODOs.
With these changes, is there a point to having separate Linear and LLaMA Linear definitions? At a glance it looks like all the differences are configurable, with the only difference being what is assumed as the default behavior if unspecified.
Refactored the activations and MLP layer to get rid of our redundant llama mlp class, and added some activations functions from https://arxiv.org/pdf/2002.05202.