web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

Models output is scrambled in Safari Technology Preview, which has WebGPU support

Open felladrin opened this issue 1 year ago • 1 comments

Just wanted to inform that I installed Safari Technology Preview to check its support for WebGPU and I noticed that the output of any model in Web-LLM appears scrambled like this:

image image

And sometimes it just repeats the same word:

https://github.com/mlc-ai/web-llm/assets/418083/176653dd-d4a5-4a5a-b950-4b776e7783e2

Although not urgent, it may be worth looking into it before the next major release of Safari.

felladrin avatar May 02 '24 15:05 felladrin

This is a known issue that we are working with the safari team to address https://bugs.webkit.org/show_bug.cgi?id=266793

tqchen avatar May 02 '24 15:05 tqchen

This is working for me in Safari Technology Preview 193: https://developer.apple.com/safari/resources/ stp193

mwyrzykowski avatar Jul 04 '24 00:07 mwyrzykowski

Safari from the iOS 18 developer betas is still non-functional as it appears web-llm requires 1024MB sized buffers: ios18beta

The default max buffer size in WebGPU is 256MB. I tried raising the limit to 1024MB but then iOS Safari is terminated due to memory pressure. I filed https://bugs.webkit.org/show_bug.cgi?id=275958 to track that issue

mwyrzykowski avatar Jul 04 '24 00:07 mwyrzykowski

I'm not sure if this issue should be closed now as it is working on Safari Technology Preview 193 and a new one should be opened for iOS support?

iOS support is a bit trickier, while everything should 'just work' like an Apple Silicon Mac, if WebLLM really does require a 1GB buffer and allocates a JavaScript array of 1024MB, then this will lead to memory issues on iOS.

It would be preferable if WebLLM can work within the default 256MB buffer size limit of WebGPU, if possible, potentially creating multiple buffers if needed and not creating a JS array buffer greater in size than 256MB at once.

mwyrzykowski avatar Jul 04 '24 00:07 mwyrzykowski

Thank you @mwyrzykowski for enabling this, this issue can be closed.

There are smaller models that we use for smaller context (the models that ends with -1k suffix) that might workd for 256MB limit, as android also has this limit

tqchen avatar Jul 07 '24 12:07 tqchen

Hi @mwyrzykowski, thanks for the support and sorry for the delayed response. WebLLM after npm 0.2.47 should fall back to maxBufferSize 256MB in case the 1024MB hits the limit: https://github.com/mlc-ai/web-llm/pull/498

Tried https://chat.webllm.ai/ on my iOS 18 developer beta and it works well; the speed is great too!

CharlieFRuan avatar Jul 17 '24 03:07 CharlieFRuan

Oh that is amazing news, thank you so much for the update!

mwyrzykowski avatar Jul 17 '24 05:07 mwyrzykowski

Actually @mwyrzykowski if it is possible to enable iOS to have bigger buffers, it would be nice. Since some of the 3B models can fit well on mobile, and may have 1GB buffer requirements

tqchen avatar Jul 17 '24 13:07 tqchen

Actually @mwyrzykowski if it is possible to enable iOS to have bigger buffers, it would be nice. Since some of the 3B models can fit well on mobile, and may have 1GB buffer requirements

It is possible but the memory limit per web process is 1.5GB. If we raise it, the website needs to be careful with populating the buffer because both the ArrayBuffer in JavaScript and the buffer allocation within WebGPU count towards that 1.5GB limit.

The WebGPU specification currently requires we make a copy of the buffer so I don’t think we can directly use the JS allocation.

In order to use a 1GB buffer you would need to also not allocate a 1GB buffer on the JS side, because at that point you would be at 2GB and the web process would get terminated. Rather the GPUBuffer instance would need to be filled incrementally, like in 128MB chunks or so. That should remain below the memory limit.

mwyrzykowski avatar Jul 17 '24 15:07 mwyrzykowski

@tqchen is it possible to achieve the same with four 256MB buffers? WebKit’s concern to raising the limit is an arbitrary website may see 1GB and make a 1GB JavaScript ArrayBuffer to fill it. Which will work fine on macOS but not iOS.

mwyrzykowski avatar Jul 17 '24 15:07 mwyrzykowski

get it, thank you @mwyrzykowski for explaining, I think we can stay with 256 MB for now then, hopfully some of the the smaller models should still work well.

tqchen avatar Jul 17 '24 16:07 tqchen