mistral.rs
mistral.rs copied to clipboard
Python metal package runs on CPU
Describe the bug Regardless of installing mistralrs-metal (or mistralrs-accelerate) model runs on CPU. This is indicated by log during running and takes exactly the same amount of time as if running package without accelerators.
Version Package from PYPI mistralrs-metal==0.1.24. I am running on M1Pro macbook with MacOS 14.5 (23F79).
Example
from mistralrs import Architecture, ChatCompletionRequest, Runner, Which
runner = Runner(
which=Which.Plain(
model_id="microsoft/Phi-3-mini-4k-instruct",
arch=Architecture.Phi3,
tokenizer_json=None,
repeat_last_n=64,
)
)
res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="mistral",
messages=[{"role": "user", "content": "Hi! How to solve hanoi towers?"}],
max_tokens=256,
presence_penalty=1.0,
top_p=0.1,
temperature=0.7,
)
)
print(res.choices[0].message.content)