Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

Add Llama-2 models

Open ArthurConmy opened this issue 1 year ago • 2 comments

Addressing in https://github.com/neelnanda-io/TransformerLens/pull/352

[x] Implemented Llama-2-7B and Llama-2-13B [ ] Implement Llama-2-70B architecture (add Grouped-Query Attention)

ArthurConmy avatar Jul 22 '23 18:07 ArthurConmy

Out of curiosity, does TransformerLens currently support models that need multiple devices to run inference (aka. model too big to fit in one device's RAM)? If not, is this the main bottleneck to implementing the Llama-2-70B architecture.

Sorry if this has feature has been added, I've been skimming the docs/issues and I havn't been able to find it yet.

msakarvadia avatar Nov 03 '23 22:11 msakarvadia

It already supports multiple devices, you need to pass the n_devices parameter to from_pretrained (or n_devices maybe?). The bottleneck on llama 2 70b was grouped query attention which is currently being added to support mistral, so it should be easy to add llama 2 70b soon

On Fri, 3 Nov 2023, 10:52 pm Mansi Sakarvadia, @.***> wrote:

Out of curiosity, does TransformerLens currently support models that need multiple devices to run inference (aka. model too big to fit in one device's RAM)? If not, is this the main bottleneck to implementing the Llama-2-70B architecture.

Sorry if this has feature has been added, I've been skimming the docs/issues and I havn't been able to find it yet.

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/353#issuecomment-1793212370, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKJDLJJXUJQVKBDOD5TYCVYSPAVCNFSM6AAAAAA2T7T25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGIYTEMZXGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

neelnanda-io avatar Nov 03 '23 23:11 neelnanda-io