web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

[Device] Catch WebGPU OOM error

Open CharlieFRuan opened this issue 1 year ago • 2 comments

Prior to this PR, when users createEngine() or call reload() with a model that is too large for the device, likely the device would keep generating, ignoring OOM issue and correctness. See https://github.com/mlc-ai/web-llm/issues/356 and https://github.com/mlc-ai/web-llm/issues/209.

This PR catches such error with device.lost.then(), depending on tvmjs to call device.destroy() upon detecting error in createBuffer() via https://github.com/apache/tvm/pull/17005.

We have only observed createBuffer() errors and hence will only process such kind of errors for now. Besides, since most OOM errors occur in reload(), we make the error handling synchronous despite using .then() by throwing the error at the end of reload() if there is one.

CharlieFRuan avatar May 17 '24 09:05 CharlieFRuan

Example of trying to allocate a KV cache with 900k context length (should be similar for trying to load a model that is too large): Screenshot 2024-05-17 at 2 25 06 AM

CharlieFRuan avatar May 17 '24 09:05 CharlieFRuan

Marked as a draft for now as it depends on https://github.com/apache/tvm/pull/17005

CharlieFRuan avatar May 17 '24 09:05 CharlieFRuan