llama
llama copied to clipboard
Inference code for Llama models
Our server has A100\*2 (80GB), A6000\*2 (49GB), and A5000\*2 (24GB). Currently, without any modification, we can only run at most the 30B model, because by default, the 65B model requires...
`nvcc --version` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Wed_Jul_22_19:09:09_PDT_2020 Cuda compilation tools, release 11.0, V11.0.221 Build cuda_11.0_bu.TC445_37.28845127_0 `uname -m` x86_64 `lsb_release -a` Distributor...
Once i have completed the installation and try a test with test.py with the 8B model I had the following error: ``` (base) lorenzo@lorenzo-desktop:~/Desktop/llama$ torchrun --nproc_per_node 1 example.py --ckpt_dir ./model/model_size...
This issue is related to issue #49 The 3rd largest model size in the paper and readme file is 33B, `download.sh`, it is 30B. Line 5: `MODEL_SIZE="7B,13B,30B,65B"` Line 12: `N_SHARD_DICT["30B"]="3"`
Just wondered what cool projects people will be making with this? I have some good ideas such as trying to combine it with a math engine to make it genius...
An excerpt from the original research paper - "LLaMA-65 outperforms Chinchilla-70B on all reported benchmarks but BoolQ" is inconsistent with results shared in Table 3: Zero-shot performance on Common Sense...
Thanks for the amazing work. I wonder whether the weights of the lm head of the model are tied with the word embeddings of the model. From the code, it...
hello, t cannot understand the email review:Save bandwidth by using a torrent to distribute more efficiently,can you tell me how to download model? thanks
Not sure how long I can keep this running https://huggingface.co/spaces/chansung/LLaMA-13B
i have seen someone in this issues Message area said that 7B model just needs 8.5G VRAM. but why i ran the example.py returns out of memory on a 24G...