mlc-llm
mlc-llm copied to clipboard
very slow on Mac m3
🐛 Bug
When I ask first few questions in chat, the app (with Mistral 7b model) is able to generate answers very fast - 2000 tokens/min. But at some point, it starts slow down, and I am getting about 5tokens/min. The same model work very fast with another solution - LLMFarm. Even Gemma model work only for first 10-20 questions next it starts slow down to 5tokens/min.
To Reproduce
Steps to reproduce the behavior:
- Download MLC Chat from Apple store to Mac with m3
- Choose Mistral 7b model
- Aks few questions in chat
Environment
- Operating system (e.g. Ubuntu/Windows/MacOS/...): Macos
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Macbook Pro
- How you installed MLC-LLM (
conda
, source): Apple store