mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Perf: load weights, create KV cache, initialize tokenizer in parallel

Open Bekaboo opened this issue 6 months ago • 0 comments

Use multiple thread to load weights, cache and tokenizer, should slightly improve the initialization and TTFT time.

img_6

Bekaboo avatar Apr 27 '25 02:04 Bekaboo