Easy-Transformer Add Llama-2 models

Addressing in https://github.com/neelnanda-io/TransformerLens/pull/352

[x] Implemented Llama-2-7B and Llama-2-13B [ ] Implement Llama-2-70B architecture (add Grouped-Query Attention)

Jul 22 '23 18:07 ArthurConmy

Out of curiosity, does TransformerLens currently support models that need multiple devices to run inference (aka. model too big to fit in one device's RAM)? If not, is this the main bottleneck to implementing the Llama-2-70B architecture.

Sorry if this has feature has been added, I've been skimming the docs/issues and I havn't been able to find it yet.

Nov 03 '23 22:11 msakarvadia

It already supports multiple devices, you need to pass the n_devices parameter to from_pretrained (or n_devices maybe?). The bottleneck on llama 2 70b was grouped query attention which is currently being added to support mistral, so it should be easy to add llama 2 70b soon

On Fri, 3 Nov 2023, 10:52 pm Mansi Sakarvadia, @.***> wrote:

Out of curiosity, does TransformerLens currently support models that need multiple devices to run inference (aka. model too big to fit in one device's RAM)? If not, is this the main bottleneck to implementing the Llama-2-70B architecture.

Sorry if this has feature has been added, I've been skimming the docs/issues and I havn't been able to find it yet.

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/353#issuecomment-1793212370, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKJDLJJXUJQVKBDOD5TYCVYSPAVCNFSM6AAAAAA2T7T25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGIYTEMZXGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Nov 03 '23 23:11 neelnanda-io

Easy-Transformer Easy-Transformer copied to clipboard

Add Llama-2 models

Easy-Transformer
Easy-Transformer copied to clipboard