mlc-llm
mlc-llm copied to clipboard

Published 20 hours ago •

Reame
Issues

Perf: load weights, create KV cache, initialize tokenizer in parallel

Open Bekaboo opened this issue 6 months ago • 0 comments

Use multiple thread to load weights, cache and tokenizer, should slightly improve the initialization and TTFT time.

Apr 27 '25 02:04 Bekaboo