grimulkan comments

Repositories
Issues
Comments

Results 14 comments of


                                            grimulkan

YaRN Support

From my limited understanding, the authors claim that trying to use NTK-alpha scaling effectively extrapolates some dimensions, unlike linear scaling which never does. This, they say, Is why it is...

Numerical errors in backward

With Llama 405B there are many layers, and with ring sizes of 4 or 8 the numerical errors become catastrophic in backward. The errors actually originate in the forward pass...

Flash attention implementations do not handle case where value vectors have different dimension from query vectors

Any interest in re-opening, now that we have DS-R1?

Eval bug: context shift is disabled

https://github.com/ggml-org/llama.cpp/issues/7343 is what is going on I think