Easy-Transformer
Easy-Transformer copied to clipboard
Add Llama-2 models
Addressing in https://github.com/neelnanda-io/TransformerLens/pull/352
[x] Implemented Llama-2-7B and Llama-2-13B [ ] Implement Llama-2-70B architecture (add Grouped-Query Attention)
Out of curiosity, does TransformerLens currently support models that need multiple devices to run inference (aka. model too big to fit in one device's RAM)? If not, is this the main bottleneck to implementing the Llama-2-70B architecture.
Sorry if this has feature has been added, I've been skimming the docs/issues and I havn't been able to find it yet.
It already supports multiple devices, you need to pass the n_devices parameter to from_pretrained (or n_devices maybe?). The bottleneck on llama 2 70b was grouped query attention which is currently being added to support mistral, so it should be easy to add llama 2 70b soon
On Fri, 3 Nov 2023, 10:52 pm Mansi Sakarvadia, @.***> wrote:
Out of curiosity, does TransformerLens currently support models that need multiple devices to run inference (aka. model too big to fit in one device's RAM)? If not, is this the main bottleneck to implementing the Llama-2-70B architecture.
Sorry if this has feature has been added, I've been skimming the docs/issues and I havn't been able to find it yet.
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/353#issuecomment-1793212370, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKJDLJJXUJQVKBDOD5TYCVYSPAVCNFSM6AAAAAA2T7T25GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGIYTEMZXGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>