pytorch-llama Some questions about your source code

Some questions about your source code

Open shawnnjupt opened this issue 10 months ago • 0 comments

in your source code the first time in forward, you use tokens from 0:1 in each batch，this is not ture for llama2(decoder only) llama2 can be devided into prefill step and decoder steps ,in prefill step,all tokens should be translated to forward instead of the first token then the prefill step generate the first new token ,and add it into tokens list and then generate the next. so the calculation is that ,first cal [sequence_length.dim] then (use kv cache)，only cal [1,dim] you can see the true code in llama repohttps://github.com/meta-llama/llama/blob/main/llama/generation.py

also, i think the name of decoder layer is described by encoder layer in your source code

Dec 13 '24 10:12 shawnnjupt

pytorch-llama pytorch-llama copied to clipboard

Some questions about your source code

pytorch-llama
pytorch-llama copied to clipboard