mistral.rs issues

Server crashes while processing 2 concurrent requests

1

**Describe the bug** If two requests are sent to the server at roughly the same time, it will start to respond to both requests and then crash with the following...

LLukas22

bug

Problem running with Mac M1

1

My mac has an m1 chip, execute the following command： cargo run --release --features mkl -- -i plain -m meta-llama/Meta-Llama-3-8B-Instruct -a llama The following error occurs. Does it mean that...

wking1986

Update PyO3 to take dict

1

This increases compatibility with OpenAI and llama-cpp-python. I would appreciate any thoughts on this change. # Breaking This breaks any code which uses the chat completion API as it removes...

EricLBuehler

pyo3

breaking

Use indicatif progress bar instead of tqdm

1

This also updates the loading process to track loading of shards instead of tensors. This will enable loading in Jupyter without being rate limited and hanging.

EricLBuehler

Implement dynamic LoRA swapping

1

Dynamic LoRA swapping, first raised in #259, enables the user to dynamically set active LoRA adapters. This can be configured per-request to enable users to add their own routing functionality....

EricLBuehler

new feature

backend

models

A feature allowing swapping LoRA adapters at runtime could reduce the overhead for running multiple specialized model adapters. This style could either facilitate serving different models to individual users (akin...

BHX2

new feature

models

distribute with cargo dist?

3

love to see more rust in the AI space. i work on a tool called cargo-dist that can help package up pre-built binaries and build installers so it's easier for...

ashleygwilliams

Add support for Idefics 2

1

This PR adds support for our first multimodal model: Idefics 2 (https://huggingface.co/HuggingFaceM4/idefics2-8b)! **Implementation TODOs:** - [x] VisionTransformer - [x] Encoder - [x] Attention - [x] MLP - [x] VisionEmbeddings (pending...

EricLBuehler

new feature

models

Is it possible to add support for Infini-attention?

2

There's some work being done to implement Infini-attention from https://arxiv.org/pdf/2404.07143 In a nutshell it allows for essentially an unlimited context length without incurring the quadratic penalty. There's a proof of...

sdmorrey

Insitu quantization OOM for large models

1

**Describe the bug** quantizing large models via insitu quantization leads to out of memory issues even though the quantized final version should be able to fit in vram. **Latest commit**...

nidhoggr-nil

bug

mistral.rs
mistral.rs copied to clipboard

Metadata

Server crashes while processing 2 concurrent requests

Problem running with Mac M1

Update PyO3 to take dict

Use indicatif progress bar instead of tqdm

Implement dynamic LoRA swapping

LoRA swapping at runtime

distribute with cargo dist?

Add support for Idefics 2

Is it possible to add support for Infini-attention?

Insitu quantization OOM for large models

← Metadata

Owner

Metadata

mistral.rs mistral.rs copied to clipboard

Metadata

← Metadata

Owner

Metadata

mistral.rs
mistral.rs copied to clipboard