mistral.rs issues

Llama 3.2 on macOS: "Metal contiguous affine I64 not implemented"

9

## Describe the bug Running Llama 3.2 on my MacBook Pro M3 Max (128GB) - with ``` cargo run --release --features metal -- --port 1234 vision-plain -m lamm-mit/Cephalo-L lama-3.2-11B-Vision-Instruct-128k -a...

ChristianWeyer

bug

triaged

example grammar is failed

2

## Describe the bug I got error directory: `/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs/examples` mymachine environment ``` ProductName: macOS ProductVersion: 14.4.1 Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro18,4 Model Number: Z15H0016ZJ/A Chip: Apple...

higumachan

bug

Molmo support

https://youtu.be/gtcOncFLMeo

win4r

new feature

Return stop reason enum in responses

1

Refs #800.

EricLBuehler

breaking

CUDA_ERROR_ILLEGAL_ADDRESS when running Llama3 and Llama3.1

4

This occurs when using two GPUs, but it does not occur when I use just the one. I made sure to update to the docker image used in the dockerfile....

ShelbyJenkins

How to deploy mistralrs on Android for large model inference?

1

How to deploy mistralrs on Android for large model inference?

sopaco

new feature

v0.3.0 missing binaries

1

no pre-built `mistralrs-server` binaries under assets in [0.3.0 github release](https://github.com/EricLBuehler/mistral.rs/releases/tag/v0.3.0) very sad 😢

aretrace

bug

Return finish_reason as an Enum - with wrapped stopping word/sequence/tok

I use stop words/sequences to determine next steps after a response. So if the LLM returns stop word A, we perform action X. I also do the same with other...

ShelbyJenkins

new feature

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__

Hello, llama.cpp recently added support for an AArch64 specific type of GGUF and AArch64 specific matmul kernels. Here is the merged PR https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660 Namely Q4_0_8_8, Q4_0_4_8 and more generic Q4_0_4_4...

smpurkis

new feature

Add EXL2 Quantization

1

EricLBuehler

mistral.rs
mistral.rs copied to clipboard

Metadata

Llama 3.2 on macOS: "Metal contiguous affine I64 not implemented"

example grammar is failed

Molmo support

Return stop reason enum in responses

CUDA_ERROR_ILLEGAL_ADDRESS when running Llama3 and Llama3.1

How to deploy mistralrs on Android for large model inference?

v0.3.0 missing binaries

Return finish_reason as an Enum - with wrapped stopping word/sequence/tok

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__

Add EXL2 Quantization

← Metadata

Owner

Metadata

mistral.rs mistral.rs copied to clipboard

Metadata

← Metadata

Owner

Metadata

mistral.rs
mistral.rs copied to clipboard