turboderp issues

Repositories
Issues
Comments

Results 4 issues of


                                            turboderp

Rank-reduced models?

Do you publish the rank-reduced models anywhere?

enhancement

Fixes to alternating SWA layers in Gemma2

# What does this PR do? - Reverses the order of global and sliding attention layers in Gemma2. This brings it in line with [Google's implementation](https://github.com/google/gemma_pytorch/blob/1814f8d0a6ba93b875c46a64e6ad1873df448eef/gemma/config.py#L118) in which sliding attention...

Issue with output tensor shapes in mha_fwd_kvcache

The `mha_fwd_kvcache` function contains this GQA optimization that triggers whenever `seqlen_q` is 1, with a few other conditions: ```c++ // Faster to transpose q from (b, 1, (nheads_kv ngroups), d)...

[Feature] Llama3.1 RoPE on the fly

Are there any plans to add more options for `pos_encoding_mode`? Currently `"LLAMA"` works for Llama3.1+ models but the embeddings are subtly incorrect and accuracy suffers a bit.

enhancement