mistral.rs Python metal package runs on CPU

Python metal package runs on CPU

Open KaQuMiQ opened this issue 7 months ago • 2 comments

Describe the bug Regardless of installing mistralrs-metal (or mistralrs-accelerate) model runs on CPU. This is indicated by log during running and takes exactly the same amount of time as if running package without accelerators.

Version Package from PYPI mistralrs-metal==0.1.24. I am running on M1Pro macbook with MacOS 14.5 (23F79).

Example

from mistralrs import Architecture, ChatCompletionRequest, Runner, Which

runner = Runner(
    which=Which.Plain(
        model_id="microsoft/Phi-3-mini-4k-instruct",
        arch=Architecture.Phi3,
        tokenizer_json=None,
        repeat_last_n=64,
    )
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[{"role": "user", "content": "Hi! How to solve hanoi towers?"}],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.7,
    )
)
print(res.choices[0].message.content)

Jul 08 '24 08:07 KaQuMiQ

mistral.rs mistral.rs copied to clipboard

Python metal package runs on CPU

mistral.rs
mistral.rs copied to clipboard