flashinfer
flashinfer copied to clipboard
[Feature] Llama3.1 RoPE on the fly
Are there any plans to add more options for pos_encoding_mode? Currently "LLAMA" works for Llama3.1+ models but the embeddings are subtly incorrect and accuracy suffers a bit.