web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

How to Reduce Model Initialization Time?

Open itsaliu90 opened this issue 1 year ago • 3 comments

Hello,

Is there any way to include and utilize the model library directly from the project folder to reduce the initialization time? Or any other ideas? I don't really understand what is going on under the hood for initialization, so any relevant information about that would be appreciated as well.

This is a great project, thank you!

itsaliu90 avatar Jan 27 '24 01:01 itsaliu90

Thanks for the question! Under the hood, weights of the model selected is downloaded from the model_url field (a huggingface link) in a model record: https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24

After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).

CharlieFRuan avatar Jan 27 '24 03:01 CharlieFRuan

Related to this, you could checkout this example for loading the model from disk; it is equivalent to simple-chat-ts except the upload feature https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat-upload. This would help save the download time

CharlieFRuan avatar Jun 04 '24 19:06 CharlieFRuan

Thanks for the question! Under the hood, weights of the model selected is downloaded from the model_url field (a huggingface link) in a model record:谢谢你的提问!在底层,所选模型的权重是从模型记录中的 model_url 字段(huggingface 链接)下载的:

https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24

After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).第一次下载后,模型会被缓存;因此,在后续运行中初始化会更快(即使在刷新等之后)。

Thanks for the question! Under the hood, weights of the model selected is downloaded from the model_url field (a huggingface link) in a model record:谢谢你的提问!在底层,所选模型的权重是从模型记录中的 model_url 字段(huggingface 链接)下载的:

https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24

After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).第一次下载后,模型会被缓存;因此,在后续运行中初始化会更快(即使在刷新等之后)。

Thanks for the question! Under the hood, weights of the model selected is downloaded from the model_url field (a huggingface link) in a model record:谢谢你的提问!在底层,所选模型的权重是从模型记录中的 model_url 字段(huggingface 链接)下载的:

https://github.com/mlc-ai/web-llm/blob/a3ff97c50025b87fdc6effa87c8a8abaca73217c/examples/get-started/src/get_started.ts#L22-L24

After the first time download, the model would be cached; hence it would be faster to initialize in subsequent runs (even after refreshes, etc.).第一次下载后,模型会被缓存;因此,在后续运行中初始化会更快(即使在刷新等之后)。

Hello, I am also concerned about this issue. I have noticed that after all models are compiled, their model parameter files (.bin) are divided into multiple files and then sequentially loaded from cache into video memory. I would like to know the reason for splitting the model parameters. Would not splitting them result in faster loading times?

137591 avatar Jun 24 '24 14:06 137591