grimulkan
grimulkan
### Describe the bug Commit 5744b315930c6b0fd4dc6c1df96b2724253366d8 is fine (~10 tok/s). Commit f2bf1a2c9e0074ea7cb98e6d9176940998ba0559 and later is 10x slower in streaming mode (
I'm trying to figure out if the format for multi-round conversations has an EOS token appended at the end of each assistant reply in the history, or none at all...
Any thoughts/plans about YaRN support for the positional embeddings? https://github.com/jquesnelle/yarn I don't actually see them beat regular linear scaling w/ fine-tuning in the paper, but presumably it extends beyond the...
Would it be possible to return higher precision tensors where relevant, as an option, to allow users to break apart attention computation in blocks? For example, in ring + flash...
Were you able to find out the reason for the small numerical errors in backward pass with ring flash attention? I found the errors increase as you increase the world...