pytorch-llama icon indicating copy to clipboard operation
pytorch-llama copied to clipboard

LLaMA 2 implemented from scratch in PyTorch

Results 10 pytorch-llama issues
Sort by recently updated
recently updated
newest added

First of all, thank you for the great resources and Youtube videos. I wanted to point out that in slide 25 of the Llama notes, regarding the computational efficient realization...

what is minimal computer can be used for only inference i do have ubuntu with 3060 GPU 8 GB can I use it?

will it work on windows OS with only CPU?

https://github.com/hkproj/pytorch-llama/blob/067f8a37fe36ac8b52dca9cc6f2a2e8d6aa372d6/model.py#L230-L235 No need to use forward method? I mean, we could use nn.Module directly. ``` h = x + self.attention(self.attention_norm(x), start_pos, freqs_complex) out = h + self.feed_forward(self.ffn_norm(h)) ```

Hello, Could you please advise me on how to disable the KV cache? I would also appreciate any guidance on how to implement this change in code. Thank you for...

The mask in https://github.com/hkproj/pytorch-llama/blob/067f8a37fe36ac8b52dca9cc6f2a2e8d6aa372d6/inference.py#L121 should be `~mask` since we want to select all those indices where value is less than p.

Why is causal attention mask not used?

in your source code ![image](https://github.com/user-attachments/assets/31fac10b-8019-4c78-ab7b-d65f910ce866) the first time in forward, you use tokens from 0:1 in each batch,this is not ture for llama2(decoder only) llama2 can be devided into prefill...