web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

Feature Request: Provide progress callback for loading model from cache

Open kentcdodds opened this issue 5 months ago • 6 comments

Currently, initProgressCallback only reports progress when the model is being downloaded from the network. However, when the model is already cached in the browser (via CacheStorage), the user receives no feedback, even though loading and initializing the model from cache can still take multiple seconds.

I’d like to request a new callback, e.g., cacheLoadProgressCallback, that reports progress as each cached shard is read and processed.

This would allow developers to show meaningful progress indicators during both cold and warm starts.

Possible implementation ideas:

  • Trigger callback as each shard is read from CacheStorage and passed into arrayBuffer()
  • Report total bytes loaded from cache vs total expected
  • Or simply provide a boolean flag indicating whether loading is from cache or network

Thanks for the awesome work on WebLLM!

kentcdodds avatar Jul 23 '25 20:07 kentcdodds

I am pretty sure this did work in the past..

nico-martin avatar Jul 24 '25 20:07 nico-martin

Thanks for the issue! If I understand your request correctly this indeed should work. e.g. in chat.webllm.ai, you can see the following response "loading model from cache"

Image

CharlieFRuan avatar Jul 24 '25 20:07 CharlieFRuan

Yes, but the problem is that its always 0%. I guess you could argue that this makes sense since 0 bytes are loaded from the network. But from a user perspective they don't really care if its loaded from the network or from cache. They just want to know the progress.

nico-martin avatar Jul 24 '25 20:07 nico-martin

Ah you're right. Likely a bug in https://github.com/apache/tvm/blob/8a914e58925557741aca6d7453e5d94004254079/web/src/runtime.ts#L1316

CharlieFRuan avatar Jul 24 '25 20:07 CharlieFRuan

Not ideal, @kentcdodds , but as a workaround, you can parse the [xx/yy] from the text. See the _findPercentCompleteFromStatus() function at top of https://github.com/DecentAppsNet/decentapp-template/blob/main/src/loadScreen/interactions/initialization.ts

erikh2000 avatar Aug 16 '25 22:08 erikh2000

Created a PR to solve this in apache/tvm, in the meantime a workaround is to used what @erikh2000 posted above

insertmike avatar Nov 14 '25 12:11 insertmike