Eric Buehler
Eric Buehler
**Describe the bug** This affects models which use sliding window attention, but only when the sequence length is great enough (seq_len > sliding_window) to need the sliding window. This will...
Hello everyone, Thank you for your great work here. Our project makes extensive use of struct enum variants. However, we have many variants which should have default values: some of...
This increases compatibility with OpenAI and llama-cpp-python. I would appreciate any thoughts on this change. # Breaking This breaks any code which uses the chat completion API as it removes...
This also updates the loading process to track loading of shards instead of tensors. This will enable loading in Jupyter without being rate limited and hanging.
Hello all, Thank you for your excellent work here. I am trying to load a `tokenizer.model` file in my Rust application. However, it seems that the `Tokenizer::from_file` function only support...
Dynamic LoRA swapping, first raised in #259, enables the user to dynamically set active LoRA adapters. This can be configured per-request to enable users to add their own routing functionality....
Hello all, Thanks for your great work here. We are implementing speculative decoding at mistral.rs, and were in the final stages of testing when we discovered some incredibly strange behavior....