web-llm Generate error, OperationError: Device lost during onSubmittedWorkDone (do not use this error for recovery - it is NOT guaranteed to happen on device loss)

Thank you for this project.

I'm using Linux and Chrome:

$ uname -a
Linux msi 6.1.80-1-MANJARO #1 SMP PREEMPT_DYNAMIC Fri Mar  1 18:09:53 UTC 2024 x86_64 GNU/Linux

$ google-chrome-stable --version
Google Chrome 124.0.6367.118

However, the demo website got this error:

Generate error, OperationError: Device lost during onSubmittedWorkDone (do not use this error for recovery - it is NOT guaranteed to happen on device loss)

May 05 '24 20:05 nathanbowang

Thanks for reporting the issue. Device lost is usually due to not having enough memory; perhaps you can try smaller models like Gemma-2B.

You can check out the field vram_required_MB in webllm.prebuiltAppConfig: https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L251. Llama-3B-q4f32-1k requires ~5GB which may exceed the limit of RTX 3050 Ti.

May 06 '24 01:05 CharlieFRuan

Thank you very much for your help. Following your guidance resolved the problem.

Here's my experience with the Llama-3-8B-q4f32-1k 5GB model on my MSI Katana GF66 12UD laptop. It's set up with dual-boot Windows and Linux operating systems. The laptop features two video cards, and both drivers are correctly installed. The computer is equiped with 32G RAM.

OS	Intel iGPU (ADL GT2 16GB shared RAM)	Nvidia dGPU (RTX 3050 Ti Laptop 4GB vRAM)
Windows	Load the 5GB model to RAM. Shader-f16 is supported. faster than dGPU. Models can run	Load 4GB to vRAM, and the shared RAM was not increasing. Shader-f16 is supported. Models can run but very slow
Linux	Load the 5GB model to RAM. Shader-f16 is supported. Models can run	Load the model to vRAM until 4GB, then the dGPU vRAM memory usage drop to 0 and throw the above error. Shader-f16 is NOT supported so Gemma-2B can't run. I haven't found any model can run on dGPU from Linux, so I've to switch to iGPU on Linux but the speed is acceptable

May 08 '24 21:05 nathanbowang