grimulkan issues

Results 5 issues of


                                            grimulkan

10x slowdown using Vicuna 13b 4bit commit f2bf1a2 onwards

### Describe the bug Commit 5744b315930c6b0fd4dc6c1df96b2724253366d8 is fine (~10 tok/s). Commit f2bf1a2c9e0074ea7cb98e6d9176940998ba0559 and later is 10x slower in streaming mode (

bug

EOS token question in multi-round in OASST

I'm trying to figure out if the format for multi-round conversations has an EOS token appended at the end of each assistant reply in the history, or none at all...

YaRN Support

Any thoughts/plans about YaRN support for the positional embeddings? https://github.com/jquesnelle/yarn I don't actually see them beat regular linear scaling w/ fine-tuning in the paper, but presumably it extends beyond the...

Return 32-bit for external accumulation

Would it be possible to return higher precision tensors where relevant, as an option, to allow users to break apart attention computation in blocks? For example, in ring + flash...

Numerical errors in backward

Were you able to find out the reason for the small numerical errors in backward pass with ring flash attention? I found the errors increase as you increase the world...