gpt-neox Add `intermediate_size` to GPT-NeoX models

Add `intermediate_size` to GPT-NeoX models

Open dtamayo-nlp opened this issue 9 months ago • 5 comments

The current implementation only allows to set intermediate_size for Llama models, but I would like to be capable to change the intermediate_size in GPT-NeoX models.

I have tested this implementation under a quick training, inference and conversion and it seems that it doesn't give any bugs. I hope it helps!

May 10 '24 08:05 dtamayo-nlp

All committers have signed the CLA.

May 10 '24 08:05 CLAassistant

This looks good. Need to make it consistent with mamba and RWKV. We also need some TODO statements about revisiting this once we add swiglu.

@jahatef is on it

Jun 19 '24 20:06 Quentin-Anthony

Added support for mamba and RWKV, and added TODOs.

Jun 19 '24 22:06 jahatef

With these changes, is there a point to having separate Linear and LLaMA Linear definitions? At a glance it looks like all the differences are configurable, with the only difference being what is assumed as the default behavior if unspecified.

Jun 20 '24 19:06 StellaAthena

Refactored the activations and MLP layer to get rid of our redundant llama mlp class, and added some activations functions from https://arxiv.org/pdf/2002.05202.

Jul 25 '24 23:07 jahatef

gpt-neox gpt-neox copied to clipboard

Add `intermediate_size` to GPT-NeoX models

gpt-neox
gpt-neox copied to clipboard