llama icon indicating copy to clipboard operation
llama copied to clipboard

who can run on 7B model on `windows11` with `RTX3080ti` ?

Open zhuxiujia opened this issue 1 year ago • 7 comments

I can running llama but this is cuda is out of memory, who can run on 7B model on windows11 with RTX3080ti ? other projects don't seem to have windows versions?

zhuxiujia avatar Mar 07 '23 15:03 zhuxiujia

I loaded the model using Colab Pro + and the 7B model consumed around 24 GB of VRAM. If you just want to load the model and don't expect super-accurate results (because model partitioning is done here on cell 2 in the notebook), The instructions are mentioned in this notebook Colab Notebook.

ajaysurya1221 avatar Mar 08 '23 04:03 ajaysurya1221

Can you try reducing the batch size and see if it works?

AAnirudh07 avatar Mar 10 '23 08:03 AAnirudh07

Can you try reducing the batch size and see if it works?

Even though I narrowed down the parameters, it still prompts cuda out of mem

zhuxiujia avatar Mar 10 '23 08:03 zhuxiujia

Hmm, in that case, these links might be helpful (assuming you haven't tried int8 or int4 implementations yet):

  1. https://github.com/facebookresearch/llama/issues/79#issuecomment-1460464011 - This github user was able to run llama-65B using an RTX 3070ti gpu
  2. https://github.com/facebookresearch/llama-recipes/issues/171 - Serveral users were able to perform model inference using the bitsandbytes library
  3. https://github.com/TimDettmers/bitsandbytes/ - you can use this lib for LLM.int8() inference
  4. https://github.com/tloen/llama-int8 - int8 LlaMA implementation
  5. https://github.com/qwopqwop200/GPTQ-for-LLaMa - int4 LlaMA implementation

AAnirudh07 avatar Mar 10 '23 09:03 AAnirudh07

I am able to run on windows 11 with a 3060. You check out my repo https://github.com/public-git-ui/st-llama. You can ignore the web UI bits. On my machine, it's really slow, 3 tokens per second after warm up.

public-git-ui avatar Mar 10 '23 21:03 public-git-ui

I did it here with single python script: https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py

Tongjilibo avatar Mar 17 '23 16:03 Tongjilibo

I can running llama but this is cuda is out of memory, who can run on 7B model on windows11 with RTX3080ti ? other projects don't seem to have windows versions?

Got it running on my RTX-3080 (16 GB) with model = transformers.LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf", torch_dtype=torch.float16).to('cuda')

marburps avatar Mar 21 '23 21:03 marburps

The issue should be resolved by suggestions above. Please re-open as needed.

WuhanMonkey avatar Sep 06 '23 17:09 WuhanMonkey