aikitoria
aikitoria
### Have you searched for similar requests? Yes ### Is your feature request related to a problem? If so, please describe. _No response_ ### Describe the solution you'd like It...
### Have you searched for similar requests? Yes ### Is your feature request related to a problem? If so, please describe. _No response_ ### Describe the solution you'd like I've...
It would be very nice if the library supported using Min-P sampling as an alternative to Top-P/Top-K. This became popular for local LLMs in the past few months because it...
**Is your feature request related to a problem? Please describe.** hf_transfer is very fast for individual files, but for models with many split files, it's not quite as fast as...
### Have you searched for similar requests? Yes ### Is your feature request related to a problem? If so, please describe. _No response_ ### Describe the solution you'd like Currently...
It would be great to support this new model! https://cohere.com/blog/command-a They use a fairly unique architecture, where some layers use sliding window attention while others use global attention with no...
### System Info - 8x 4090 on dual Epyc server running Debian testing - CUDA toolkit version 12.8, driver version 570.86 - Release container compiled from release 0.17 tag ###...
This adds support for Cohere2ForCausalLM architecture which interleaves global layers without position embedding with sliding window layers with rope positions. I also fixed the RuntimeDefaults thing not actually working in...
This adds FP8 support for the LayerNorm kernel in the same way as was done for the RmsNorm kernel, which then allows us to use FP8 Rowwise quantization with the...
### Checklist - [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed. - [x]...