mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

very slow on Mac m3

Open aiakubovich opened this issue 6 months ago • 2 comments

🐛 Bug

When I ask first few questions in chat, the app (with Mistral 7b model) is able to generate answers very fast - 2000 tokens/min. But at some point, it starts slow down, and I am getting about 5tokens/min. The same model work very fast with another solution - LLMFarm. Even Gemma model work only for first 10-20 questions next it starts slow down to 5tokens/min.

To Reproduce

Steps to reproduce the behavior:

  1. Download MLC Chat from Apple store to Mac with m3
  2. Choose Mistral 7b model
  3. Aks few questions in chat

Environment

  • Operating system (e.g. Ubuntu/Windows/MacOS/...): Macos
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Macbook Pro
  • How you installed MLC-LLM (conda, source): Apple store

aiakubovich avatar Aug 09 '24 00:08 aiakubovich