llama icon indicating copy to clipboard operation
llama copied to clipboard

Inference code for Llama models

Results 412 llama issues
Sort by recently updated
recently updated
newest added

## Describe the bug i was using meta-llama/Llama-2-7b-chat-hf from hugging face in a RAG model and it used to work perfectly, bur then i suddenly recieved this error : ```...

How can i inference in C ?

I have requested access at https://llama.meta.com/llama-downloads/ and waited over the last two weeks to access the Llama models for my MS Thesis research, using both my university and personal gmail...

model-access

In a multi-threaded situation, if the GPU server resources are insufficient, will cache kv preemption occur? For example, there are two conversations at the same time, both of which are...

What is the reason behind and how to fix the error: ```shell RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found! ``` ? I'm trying to run `example_text_completion.py` with:...

download-install

**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the [FAQs](https://ai.meta.com/llama/faq/) and [existing/past issues](https://github.com/facebookresearch/llama/issues)** ## Describe the bug I only have 1 GPU,...

Fixed a small doc error "evaluation were also performed on third-party cloud compute --->> evaluation were also performed on third-party cloud comput**ing**"

CLA Signed

**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the [FAQs](https://ai.meta.com/llama/faq/) and [existing/past issues](https://github.com/facebookresearch/llama/issues)** ## Describe the bug ### Minimal reproducible example ```python...

l use the code below to add about 100 tokens to model and tokenizer,and the model doubles in size. ``` from transformers import AutoTokenizer,AutoModel tokenizer = AutoTokenizer.from_pretrained("llama-7b-model") model = AutoModel.from_pretrained("llama-7b-model")...