gemma.cpp WSL support: "Killed", once executed on WSL

WSL subsystem anything seems normal, once executed, only "Killed" print-out.

$  ./gemma  --tokenizer /home/home/F/GPT/Gemma/2b-it-cpp/tokenizer.spm \                                                     > --compressed_weights /home/home/F/GPT/Gemma/2b-it-cpp/2b-it.sbs \                                                                > --model 2b-it 
$  Killed

ENV: clang-10.0 ubuntu18.04 && ubuntu20.04 cmake

Feb 24 '24 08:02 Code-keys

Looks like your machine is unable to support the 2B model.

Feb 24 '24 09:02 dannyvpm

Hi @Code-keys , can you try with 2b-it-sfp.sbs? SFP uses compressed 8bit weights. also make sure to use the right build (if you use 2b-it vs. 2b-it-sfp there's a different set of build settings) - we've updated the README. Can you share any information about your hardware settings (RAM, etc.)?

Feb 24 '24 16:02 austinvhuang

i will have a try

Feb 25 '24 11:02 Code-keys

I get this with the 7b model but not 2b. For me it's a memory limit issue. It seems to be killed when it runs out of RAM (8gb on my laptop) to store the 7b model.

A better error message would be helpful.

Feb 25 '24 13:02 ledurnan

using the newest projects is fine, model (2b-pt)

Feb 26 '24 02:02 Code-keys

The precision of 2b-it is nice, while the 2b-pt seems like a shit, so why?

Feb 26 '24 02:02 Code-keys

The precision of 2b-it is nice, while the 2b-pt seems like a shit, so why?

Because 2b-it is instruction tuned. 2b-pt is not. You're not comparing the same type of model.

Feb 26 '24 08:02 ledurnan

The rootcase is OOM, :100:

Feb 26 '24 13:02 Code-keys

Re memory issues: I'm going to adjust some defaults + make kSeqLen configurable that should improve the situation a bit.

In configs.h, a key parameter kSeqLen which preallocates a kv cache ~

kSeqLen x kLayers x kKVHeads x kQKVDim x 2 x sizeof(float)

Short term remediation:

Smaller default size
Have it be configurable at compile time

Medium term remediation:

Small initial preallocation and dynamically resize as needed.
Better OOM checking + error messages

PR should be coming today on the short term remediation, if you want to play with it yourself you can tweak kSeqLen and see if that helps.

Feb 26 '24 15:02 austinvhuang