Generate error, OperationError: Device lost during onSubmittedWorkDone (do not use this error for recovery - it is NOT guaranteed to happen on device loss)
Thank you for this project.
I'm using Linux and Chrome:
$ uname -a
Linux msi 6.1.80-1-MANJARO #1 SMP PREEMPT_DYNAMIC Fri Mar 1 18:09:53 UTC 2024 x86_64 GNU/Linux
$ google-chrome-stable --version
Google Chrome 124.0.6367.118
However, the demo website got this error:
Generate error, OperationError: Device lost during onSubmittedWorkDone (do not use this error for recovery - it is NOT guaranteed to happen on device loss)
Thanks for reporting the issue. Device lost is usually due to not having enough memory; perhaps you can try smaller models like Gemma-2B.
You can check out the field vram_required_MB in webllm.prebuiltAppConfig: https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L251. Llama-3B-q4f32-1k requires ~5GB which may exceed the limit of RTX 3050 Ti.
Thank you very much for your help. Following your guidance resolved the problem.
Here's my experience with the Llama-3-8B-q4f32-1k 5GB model on my MSI Katana GF66 12UD laptop. It's set up with dual-boot Windows and Linux operating systems. The laptop features two video cards, and both drivers are correctly installed. The computer is equiped with 32G RAM.
| OS | Intel iGPU (ADL GT2 16GB shared RAM) | Nvidia dGPU (RTX 3050 Ti Laptop 4GB vRAM) |
|---|---|---|
| Windows | Load the 5GB model to RAM. Shader-f16 is supported. faster than dGPU. Models can run | Load 4GB to vRAM, and the shared RAM was not increasing. Shader-f16 is supported. Models can run but very slow |
| Linux | Load the 5GB model to RAM. Shader-f16 is supported. Models can run | Load the model to vRAM until 4GB, then the dGPU vRAM memory usage drop to 0 and throw the above error. Shader-f16 is NOT supported so Gemma-2B can't run. I haven't found any model can run on dGPU from Linux, so I've to switch to iGPU on Linux but the speed is acceptable |