llama
llama copied to clipboard
rotary position embedding cause different output in different tensor parallel settings!
Thanks for your great work in LLM. I have tried to load llama-13b in different mp size settings, e.g., 2,4. However, the output embedding and generated sentence changes with the change of mp settings.
My question: Is this normal?
mp size = 4
mp size = 2
The -3.8359 is the mean of output embedding and 1.9458 is the std with mp size =4. the mean and std is changed when mp size=2