web-llm LLama 3.1 Error: Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model.

Getting error with llama 3.1. All other models are working fine? Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model.

Jul 26 '24 01:07 djaffer

Are you seeing this on chat.webllm.ai? Perhaps try the one with -1k suffix, which has smaller kv cache, hence less memory requirement. Also try q4f16_1 instead of q4f32_1.

Jul 26 '24 02:07 CharlieFRuan

Ok. Thanks. Something definitely seems not right here. All other models are working fine besides this.

Getting this error. "Error while parsing WGSL: :4:8 error: extension 'f16' is not allowed in the current environment\nenable f16;\n ^^^\n\n\n - While validating [ShaderModuleDescriptor]\n - While calling [Device].CreateShaderModule([ShaderModuleDescriptor]).\n"

Jul 26 '24 12:07 djaffer

The f16 error suggests that the WebGPU on your browser/device does not support f16 computation. You can check it manually at https://webgpureport.org/. If supported, you should see this shader-f16 in features:

The f16 error and Device lost error are separate. Seeing device lost with Llama3.1-q4f32_1 suggests you do not have enough RAM (it requires ^5GB according to our config.ts); seeing f16 not supported with q4f16_1 means WebGPU compatibility with f16 computation. On a side note, q4f32 models require more RAM than the q4f16 counterparts. You can see the config.ts for more.

Aug 01 '24 18:08 CharlieFRuan

Got that thanks! Not sure why I was recommended to use that. Something seems off for llama 3.1 that it is giving error. The gpu has 8gb ram. Any specific reason to not reduce the size as other models.

Aug 03 '24 00:08 djaffer