web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

The model is written "weird things" after few questions

Open kadogo opened this issue 2 years ago • 3 comments

Hello

I have an Intel and Nvidia card, so I rebuilt the tvm bundle to have the "high-performance" change. I noticed that when the model starts to write "weird things", like sentences, characters... that don't have any sense, I see these errors.

 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).

[664695:1:0418/134935.864490:ERROR:gpu_device.cc(253)] GPUDevice: [Invalid BindGroup] is invalid.
 - While encoding [ComputePassEncoder].SetBindGroup(0, [Invalid BindGroup], 0, ...).

[664695:1:0418/134935.864585:ERROR:gpu_device.cc(253)] GPUDevice: [Invalid CommandBuffer] is invalid.
    at ValidateObject (../../third_party/dawn/src/dawn/native/Device.cpp:671)
    at ValidateSubmit (../../third_party/dawn/src/dawn/native/Queue.cpp:442)

[664695:1:0418/134935.864701:ERROR:gpu_device.cc(253)] GPUDevice: [Invalid Buffer] is invalid.
 - While encoding [CommandEncoder].CopyBufferToBuffer([Buffer], 0, [Invalid Buffer], 2129920, 16384).

I'm not really sure if it must be here as issue, but maybe someone will have an idea. I can provide additional information if needed.

Cheers

kadogo avatar Apr 18 '23 13:04 kadogo

Interesting, this definitely seems to relates to issues in webgpu. Thank you sending the console logs, although based on the current message it is a bit harder to see what went wrong atm. Please do send us future logs if you encounter any and they can be helpful improving the project

tqchen avatar Apr 18 '23 15:04 tqchen

Hello @tqchen

I think I got the reason


vkAllocateMemory failed with VK_ERROR_OUT_OF_DEVICE_MEMORY
    at CheckVkOOMThenSuccessImpl (..<URL>)
    at AllocateResourceHeap (..<URL>)
    at Allocate (..<URL>)
    at Initialize (..<URL>)
    at Create (..<URL>)
    at CreateBuffer (..<URL>)
54
[Invalid Buffer] is invalid.
 - While encoding [CommandEncoder].CopyBufferToBuffer([Buffer], 0, [Invalid Buffer], 0, 2097152).
189
[Invalid CommandBuffer] is invalid.
    at ValidateObject (..<URL>)
    at ValidateSubmit (..<URL>)
54
[Invalid Buffer] is invalid.
 - While encoding [CommandEncoder].CopyBufferToBuffer([Buffer], 0, [Invalid Buffer], 2097152, 16384).
68
[Invalid Buffer] is invalid.
 - While validating entries[1] as a Buffer.
Expected entry layout: { binding: 1, visibility: ShaderStage::Compute, buffer: { type: BufferBindingType::ReadOnlyStorage, hasDynamicOffset: 0, minBindingSize: 0 } }
 - While validating [BindGroupDescriptor] against [BindGroupLayout]
 - While calling [Device].CreateBindGroup([BindGroupDescriptor]).
67
[Invalid BindGroup] is invalid.
 - While encoding [ComputePassEncoder].SetBindGroup(0, [Invalid BindGroup], 0, ...).
14
[Invalid Buffer] is invalid.
 - While encoding [CommandEncoder].CopyBufferToBuffer([Buffer], 0, [Invalid Buffer], 2113536, 16384).
127.0.0.1/:1 WebGPU: too many warnings, no more warnings will be reported to the console for this GPUDevice.

By checking with nvidia-smi, I noticed that my card was at 6025/6144 MiB and based of the messages from the browser console, I guess that I have not enough memory.

kadogo avatar Apr 18 '23 16:04 kadogo

I'm trying on the demo page with my intel card and I get that error while caching

vkAllocateMemory failed with VK_ERROR_OUT_OF_DEVICE_MEMORY
 - While handling unexpected error type OutOfMemory when allowed errors are (Validation|DeviceLost).
    at CheckVkOOMThenSuccessImpl (../../third_party/dawn/src/dawn/native/vulkan/VulkanError.cpp:101)
    at AllocateResourceHeap (../../third_party/dawn/src/dawn/native/vulkan/ResourceMemoryAllocatorVk.cpp:93)
    at Allocate (../../third_party/dawn/src/dawn/native/vulkan/ResourceMemoryAllocatorVk.cpp:166)
    at Initialize (../../third_party/dawn/src/dawn/native/vulkan/BufferVk.cpp:210)
    at Create (../../third_party/dawn/src/dawn/native/vulkan/BufferVk.cpp:140)
    at CreateBuffer (../../third_party/dawn/src/dawn/native/Device.cpp:1480)
    at AllocateInternal (../../third_party/dawn/src/dawn/native/DynamicUploader.cpp:46)
    at Allocate (../../third_party/dawn/src/dawn/native/DynamicUploader.cpp:136)
    at WriteBufferImpl (../../third_party/dawn/src/dawn/native/Queue.cpp:314)

Maybe it's related to my setup, I will give a try with the steam deck.

Edit: For the Steamdeck because it uses only flatpak I think there are some issues about making the GPU visible, so I will not try too much on this one.

I will see to make a live usb with persistence of at least 6GB for the cache or reinstall the PC and try again.

kadogo avatar Apr 18 '23 17:04 kadogo

Hello again @tqchen

I think that it's really a memory issue.

I could reproduce the same error with my intel card with a live usb. I couldn't try it with the nvidia card because I have difficulties with the drivers but I think that it sounds like that is the problem.

If there is any way to maybe offload it to CPU or disk I think I will try again ^^ Don't hesitate to ping me if there are any new test to do.

Cheers

kadogo avatar Apr 18 '23 22:04 kadogo

Thank you for getting to the bottom of it! Indeed memory is a place we should work harder on, and hopefully we will look into some of that after fp16 support lands. I will close this for now.

tqchen avatar Apr 18 '23 23:04 tqchen