Charlie Ruan
Charlie Ruan
Update: both Cache API and IndexedDB cache are supported now in `0.2.31`. User can choose either with `AppConfig.useIndexedDBCache`. For more see the PR: - https://github.com/mlc-ai/web-llm/pull/352 We also exposed some cache-related...
Closing; feel free to open new ones if issues persist
Really appreciate your input here @finom! Function calling is part of our roadmap as shown in O2 in here: https://github.com/mlc-ai/web-llm/issues/276. The goals are to support OpenAI-like APIs and features (function...
Hi! Please check out `examples/function-calling`. It is supported in `0.2.41`, via PR https://github.com/mlc-ai/web-llm/pull/451. Currently only `Hermes-2-Pro` models are supported: - https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B - https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B
Thanks for reporting the issue, this seems to be an out-of-memory issue (f32 KV cache, and 13b params); llama-2-7b-q4f32_1 requires roughly 9 GB, while 13b-q4f16_1 requires roughly 10 GB. How...
Similar VK_ERROR_OUT_OF_DEVICE_MEMORY issue was reported in mlc-llm: https://github.com/mlc-ai/mlc-llm/issues/974
I know TVM can capture OOM for other backends (e.g. for [Vulkan here](https://github.com/apache/tvm/blob/521465e6268ff50c441b6b7eea8eff8579dc6ff2/src/runtime/vulkan/vulkan_buffer.cc#L61)). I'm not too sure what would be the case for webgpu. I'll make another attempt this week;...
@beaufortfrancois I tried to catch the error with `CreateBuffer()` by adding [popErrorScope()](https://www.w3.org/TR/webgpu/#dom-gpudevice-poperrorscope) in the three places this is called in https://github.com/apache/tvm/blob/main/web/src/webgpu.ts -- no luck with that. So I instead added...
Sorry for the delay, will take a look tonight
Quick update: it does seem that the error can be caught! Not sure if I did something wrong earlier or there are some updates on the webgpu side. Since my...