How to Reduce Model Initialization Time?
Hello,
Is there any way to include and utilize the model library directly from the project folder to reduce the initialization time? Or any other ideas? I don't really understand what is going on under the hood for initialization, so any relevant information about that would be appreciated as well.
This is a great project, thank you!
Thanks for the question! Under the hood, weights of the model selected is downloaded from the model_url field (a huggingface link) in a model record:
https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24
After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).
Related to this, you could checkout this example for loading the model from disk; it is equivalent to simple-chat-ts except the upload feature https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat-upload. This would help save the download time
Thanks for the question! Under the hood, weights of the model selected is downloaded from the
model_urlfield (a huggingface link) in a model record:谢谢你的提问!在底层,所选模型的权重是从模型记录中的model_url字段(huggingface 链接)下载的:https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24
After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).第一次下载后,模型会被缓存;因此,在后续运行中初始化会更快(即使在刷新等之后)。
Thanks for the question! Under the hood, weights of the model selected is downloaded from the
model_urlfield (a huggingface link) in a model record:谢谢你的提问!在底层,所选模型的权重是从模型记录中的model_url字段(huggingface 链接)下载的:https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24
After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).第一次下载后,模型会被缓存;因此,在后续运行中初始化会更快(即使在刷新等之后)。
Thanks for the question! Under the hood, weights of the model selected is downloaded from the
model_urlfield (a huggingface link) in a model record:谢谢你的提问!在底层,所选模型的权重是从模型记录中的model_url字段(huggingface 链接)下载的:https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24
After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).第一次下载后,模型会被缓存;因此,在后续运行中初始化会更快(即使在刷新等之后)。
Hello, I am also concerned about this issue. I have noticed that after all models are compiled, their model parameter files (.bin) are divided into multiple files and then sequentially loaded from cache into video memory. I would like to know the reason for splitting the model parameters. Would not splitting them result in faster loading times?