mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Blazingly fast LLM inference.

Results 186 mistral.rs issues
Sort by recently updated
recently updated
newest added

I added some code that prints the queue state: https://github.com/EricLBuehler/mistral.rs/pull/138 I ran it on a single generation: ``` 2024-04-14T17:34:50.601969Z INFO mistralrs_core::engine: Prompt[] Completion[210] - 21ms ``` And on batches: ```...

Since generation speed is almost matching llama.cpp after https://github.com/EricLBuehler/mistral.rs/pull/152 I think it's worth it trying to optimize prompt processing now.

- [ ] RowParallelLinear - [ ] MergedColumnParallelLinear - [ ] QKVParallelLinear

paged-attention
backend

Refs and closes #215. # Api addition - DeviceMapper - All at-loading-time methods have `loading_isq` parameter - Add `fn set_nm_device, loading_isq: bool) -> VarBuilder

backend
models

Argsort was just added to Candle (https://github.com/huggingface/candle/pull/2132). Using an argsort kernel will accelerate the current CPU sorting part of `topk` or `topp` sampling, which takes a lot of time.

optimization

Closes https://github.com/EricLBuehler/mistral.rs/issues/235

Continuing https://github.com/EricLBuehler/mistral.rs/pull/219 Closes https://github.com/EricLBuehler/mistral.rs/issues/216

I'm creating this issue to track work on adding async channels to avoid blocking in the server. https://github.com/EricLBuehler/mistral.rs/pull/233 was reverted

fix

I found it while testing https://github.com/EricLBuehler/mistral.rs/pull/236