EAGLE
EAGLE copied to clipboard
How to handle embedding layernorm
some model may do layernorm after embedding, then send it to Attention layers, when face this type model, do i need to add embedding layernorm to eagle or any other trick which i need do to make eagle output right tokens. I don't know why need -2 when generate train data in llama, and how to change the -2 in myown ge_data script to other model, from now on, i try not -2 at generrate data, and add embedding layernorm or not for training eagle, both don't make good result in parallel decoding, i'm confused, the Model is BluLM-7B-Chat, thanks for helping me!