Fares Abawi
Fares Abawi
I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...
> No chance You can do it with Wrapyfi # LLaMA with Wrapyfi Wrapyfi enables distributing LLaMA (inference only) on multiple GPUs/machines, each with less than 16GB VRAM **currently distributes...
I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...
@Jehuty-ML might have to do with their recent update to the sequence length (1024 to 2048). Also, try changing the batch size to 2 and reduce the example prompts to...
You can distribute the model on two machines or GPUs and transmit the activations over ZeroMQ. Follow these instructions: # LLaMA with Wrapyfi Wrapyfi enables distributing LLaMA (inference only) on...
I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...
I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...
I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...
I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...
> Same for me, but in my case I have 2x RTX 2070 (8Gb each) 16Gb in total. How could we use multiple gpus? > > ``` > # |...