Fares Abawi

Results 26 comments of Fares Abawi

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...

> No chance You can do it with Wrapyfi # LLaMA with Wrapyfi Wrapyfi enables distributing LLaMA (inference only) on multiple GPUs/machines, each with less than 16GB VRAM **currently distributes...

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...

@Jehuty-ML might have to do with their recent update to the sequence length (1024 to 2048). Also, try changing the batch size to 2 and reduce the example prompts to...

You can distribute the model on two machines or GPUs and transmit the activations over ZeroMQ. Follow these instructions: # LLaMA with Wrapyfi Wrapyfi enables distributing LLaMA (inference only) on...

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...

I was able to run 7B on two 1080 Ti (only inference). Next, I'll try 13B and 33B. It still needs refining but it works! I forked LLaMA here: https://github.com/modular-ml/wrapyfi-examples_llama...

> Same for me, but in my case I have 2x RTX 2070 (8Gb each) 16Gb in total. How could we use multiple gpus? > > ``` > # |...