llama icon indicating copy to clipboard operation
llama copied to clipboard

Inference code for Llama models

Results 412 llama issues
Sort by recently updated
recently updated
newest added

Our server has A100\*2 (80GB), A6000\*2 (49GB), and A5000\*2 (24GB). Currently, without any modification, we can only run at most the 30B model, because by default, the 65B model requires...

`nvcc --version` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Wed_Jul_22_19:09:09_PDT_2020 Cuda compilation tools, release 11.0, V11.0.221 Build cuda_11.0_bu.TC445_37.28845127_0 `uname -m` x86_64 `lsb_release -a` Distributor...

Once i have completed the installation and try a test with test.py with the 8B model I had the following error: ``` (base) lorenzo@lorenzo-desktop:~/Desktop/llama$ torchrun --nproc_per_node 1 example.py --ckpt_dir ./model/model_size...

This issue is related to issue #49 The 3rd largest model size in the paper and readme file is 33B, `download.sh`, it is 30B. Line 5: `MODEL_SIZE="7B,13B,30B,65B"` Line 12: `N_SHARD_DICT["30B"]="3"`

Just wondered what cool projects people will be making with this? I have some good ideas such as trying to combine it with a math engine to make it genius...

An excerpt from the original research paper - "LLaMA-65 outperforms Chinchilla-70B on all reported benchmarks but BoolQ" is inconsistent with results shared in Table 3: Zero-shot performance on Common Sense...

Thanks for the amazing work. I wonder whether the weights of the lm head of the model are tied with the word embeddings of the model. From the code, it...

hello, t cannot understand the email review:Save bandwidth by using a torrent to distribute more efficiently,can you tell me how to download model? thanks

Not sure how long I can keep this running https://huggingface.co/spaces/chansung/LLaMA-13B

i have seen someone in this issues Message area said that 7B model just needs 8.5G VRAM. but why i ran the example.py returns out of memory on a 24G...