mistral.rs issues

Add a C FFI

3

Refs #258.

EricLBuehler

new feature

Garbled output on very long prompts

4

**Describe the bug** Models seam to produce garbled output on very long prompts. If i use the following script: ```python import openai from transformers import AutoTokenizer if __name__ == "__main__":...

LLukas22

bug

Implement cache shifting for Llama models

1

If this works, we can extend it to the other models. Hopefully, this will fix the problem in #339 for models without sliding window attention.

EricLBuehler

bug: If device layers requested exceed model layers, host layers overflow

5

## Describe the bug If they number of device layers exceed the models, then the host layers to assign seems to wrap/overflow instead of the expected `0`. **NOTE:** With `llama-cpp`...

polarathene

bug

resolved

Add C api and provide shared and static libraries.

1

It would be nice to a stable (or versioned) C api and provide a way to compiled shared and static libraries so one can created bindings for various othe languages....

maximus2600

new feature

Running model from a GGUF file, only

46

**Describe the bug** Running model from a GGUF file using [llama.cpp](https://github.com/ggerganov/llama.cpp) is very straightforward, just like that: `server -v -ngl 99 -m Phi-3-mini-4k-instruct-Q6_K.gguf` and if model is supported, it just...

MoonRide303

new feature

fix typos

6

mistralrs_server should be mistralrs-server

maximus2600

documentation

mistral does not support NVIDIA V100 (compute_cap <= 800)

1

**Describe the bug** it does not support some old hardware. Can it just convert bfloat16 to float16 before loading model. just like vllm is doing?

thesues

bug

Benching local GGUF model layers allocated to vRAM but no GPU activity

1

## Describe the bug Building `mistral.rs` with the `cuda` feature, when I test it with `mistralrs-bench` and a local GGUF I observed via `nvidia-smi` that layers were allocated to vRAM,...

polarathene

bug

Mistral rs python binding error

4

Bug: I am attempting to run mistral rs for inference for my own GGUF files but before that I wanted to test with the example given in the documentation. I...

shresht8

bug

mistral.rs
mistral.rs copied to clipboard

Metadata

Add a C FFI

Garbled output on very long prompts

Implement cache shifting for Llama models

bug: If device layers requested exceed model layers, host layers overflow

Add C api and provide shared and static libraries.

Running model from a GGUF file, only

fix typos

mistral does not support NVIDIA V100 (compute_cap <= 800)

Benching local GGUF model layers allocated to vRAM but no GPU activity

Mistral rs python binding error

← Metadata

Owner

Metadata

mistral.rs mistral.rs copied to clipboard

Metadata

← Metadata

Owner

Metadata

mistral.rs
mistral.rs copied to clipboard