llama
llama copied to clipboard
who can run on 7B model on `windows11` with `RTX3080ti` ?
I can running llama
but this is cuda is out of memory
,
who can run on 7B model on windows11
with RTX3080ti
?
other projects don't seem to have windows versions?
I loaded the model using Colab Pro + and the 7B model consumed around 24 GB of VRAM. If you just want to load the model and don't expect super-accurate results (because model partitioning is done here on cell 2 in the notebook), The instructions are mentioned in this notebook Colab Notebook.
Can you try reducing the batch size and see if it works?
Can you try reducing the batch size and see if it works?
Even though I narrowed down the parameters, it still prompts cuda out of mem
Hmm, in that case, these links might be helpful (assuming you haven't tried int8 or int4 implementations yet):
- https://github.com/facebookresearch/llama/issues/79#issuecomment-1460464011 - This github user was able to run llama-65B using an RTX 3070ti gpu
-
https://github.com/facebookresearch/llama-recipes/issues/171 - Serveral users were able to perform model inference using the
bitsandbytes
library -
https://github.com/TimDettmers/bitsandbytes/ - you can use this lib for
LLM.int8()
inference - https://github.com/tloen/llama-int8 - int8 LlaMA implementation
- https://github.com/qwopqwop200/GPTQ-for-LLaMa - int4 LlaMA implementation
I am able to run on windows 11
with a 3060
. You check out my repo https://github.com/public-git-ui/st-llama. You can ignore the web UI bits. On my machine, it's really slow, 3 tokens per second after warm up.
I did it here with single python script: https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py
I can running
llama
but this iscuda is out of memory
, who can run on 7B model onwindows11
withRTX3080ti
? other projects don't seem to have windows versions?
Got it running on my RTX-3080 (16 GB) with model = transformers.LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf", torch_dtype=torch.float16).to('cuda')
The issue should be resolved by suggestions above. Please re-open as needed.