pytorch-llama icon indicating copy to clipboard operation
pytorch-llama copied to clipboard

Some questions about your source code

Open shawnnjupt opened this issue 10 months ago • 0 comments

in your source code image the first time in forward, you use tokens from 0:1 in each batch,this is not ture for llama2(decoder only) llama2 can be devided into prefill step and decoder steps ,in prefill step,all tokens should be translated to forward instead of the first token then the prefill step generate the first new token ,and add it into tokens list and then generate the next. so the calculation is that ,first cal [sequence_length.dim] then (use kv cache),only cal [1,dim] you can see the true code in llama repohttps://github.com/meta-llama/llama/blob/main/llama/generation.py image

also, i think the name of decoder layer is described by encoder layer in your source code

shawnnjupt avatar Dec 13 '24 10:12 shawnnjupt