wllama Unreachable

Start LLM
Close laptop
Sleep 8 hours
Open laptop
Issue command to LLM

Jun 05 '24 05:06 flatsiedatsie

Probably it's an issue of the browser (equivalent to segfault if run in native)

Jun 05 '24 15:06 ngxson

I think the browser may be clearing the blobs from the memory when the tab gets suspended (after some time not being used).

Jun 05 '24 16:06 felladrin

Screenshot 2024-08-04 at 21 38 50

I just noticed this one again. This time on an Android mobile phone (Pixel 6a, Chrome), with just one browser tab open, and everything else closed manually.

I was trying to load a Gemma 2 2B it model. https://huggingface.co/BoscoTheDog/gemma_2_2b_it_Q4_gguf_chunked

Context is set to 1K, the model is 1.63GB, and the Pixel has 6GB of RAM. According to the OS my average memory use is 3GB.

I think the browser may be clearing the blobs from the memory when the tab gets suspended

I don't think that's the case here, as the tab is the currently active one. Maybe it's just a lack of memory issue? Or maybe like on mobile Safari there's a limit to how much RAM a tab may use?

Aug 04 '24 19:08 flatsiedatsie

I tried to load another 1.6GB (Bitnet) model on the phone, and that did load. Hmm.

I'll do a quick git clone --recurse-submodules https://github.com/ngxson/wllama.git; cd wllama; git submodule update --remote --merge; npm i; npm run build:wasm; npm run build.

// Nice, Phi 3.1 mini loads and (very slowly) generates a response. It's 2.1GB.

// Updating llama.cpp solved it.

Aug 04 '24 20:08 flatsiedatsie