mlc-llm very slow on Mac m3

very slow on Mac m3

Open aiakubovich opened this issue 6 months ago • 2 comments

🐛 Bug

When I ask first few questions in chat, the app (with Mistral 7b model) is able to generate answers very fast - 2000 tokens/min. But at some point, it starts slow down, and I am getting about 5tokens/min. The same model work very fast with another solution - LLMFarm. Even Gemma model work only for first 10-20 questions next it starts slow down to 5tokens/min.

To Reproduce

Steps to reproduce the behavior:

Download MLC Chat from Apple store to Mac with m3
Choose Mistral 7b model
Aks few questions in chat

Environment

Operating system (e.g. Ubuntu/Windows/MacOS/...): Macos
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Macbook Pro
How you installed MLC-LLM (conda, source): Apple store

Aug 09 '24 00:08 aiakubovich

mlc-llm mlc-llm copied to clipboard

very slow on Mac m3

🐛 Bug

To Reproduce

Environment

mlc-llm
mlc-llm copied to clipboard