mlc-llm
mlc-llm copied to clipboard
Perf: load weights, create KV cache, initialize tokenizer in parallel
Use multiple thread to load weights, cache and tokenizer, should slightly improve the initialization and TTFT time.