gemma.cpp icon indicating copy to clipboard operation
gemma.cpp copied to clipboard

WSL support: "Killed", once executed on WSL

Open Code-keys opened this issue 1 year ago • 4 comments

WSL subsystem anything seems normal, once executed, only "Killed" print-out.

$  ./gemma  --tokenizer /home/home/F/GPT/Gemma/2b-it-cpp/tokenizer.spm \                                                     > --compressed_weights /home/home/F/GPT/Gemma/2b-it-cpp/2b-it.sbs \                                                                > --model 2b-it 
$  Killed

ENV: clang-10.0 ubuntu18.04 && ubuntu20.04 cmake

Code-keys avatar Feb 24 '24 08:02 Code-keys

Looks like your machine is unable to support the 2B model.

dannyvpm avatar Feb 24 '24 09:02 dannyvpm

Hi @Code-keys , can you try with 2b-it-sfp.sbs? SFP uses compressed 8bit weights. also make sure to use the right build (if you use 2b-it vs. 2b-it-sfp there's a different set of build settings) - we've updated the README. Can you share any information about your hardware settings (RAM, etc.)?

austinvhuang avatar Feb 24 '24 16:02 austinvhuang

i will have a try

Code-keys avatar Feb 25 '24 11:02 Code-keys

I get this with the 7b model but not 2b. For me it's a memory limit issue. It seems to be killed when it runs out of RAM (8gb on my laptop) to store the 7b model.

A better error message would be helpful.

ledurnan avatar Feb 25 '24 13:02 ledurnan

using the newest projects is fine, model (2b-pt)

Code-keys avatar Feb 26 '24 02:02 Code-keys

The precision of 2b-it is nice, while the 2b-pt seems like a shit, so why?

Code-keys avatar Feb 26 '24 02:02 Code-keys

The precision of 2b-it is nice, while the 2b-pt seems like a shit, so why?

Because 2b-it is instruction tuned. 2b-pt is not. You're not comparing the same type of model.

ledurnan avatar Feb 26 '24 08:02 ledurnan

The rootcase is OOM, :100:

Code-keys avatar Feb 26 '24 13:02 Code-keys

Re memory issues: I'm going to adjust some defaults + make kSeqLen configurable that should improve the situation a bit.

In configs.h, a key parameter kSeqLen which preallocates a kv cache ~

kSeqLen x kLayers x kKVHeads x kQKVDim x 2 x sizeof(float)

Short term remediation:

  • Smaller default size
  • Have it be configurable at compile time

Medium term remediation:

  • Small initial preallocation and dynamically resize as needed.
  • Better OOM checking + error messages

PR should be coming today on the short term remediation, if you want to play with it yourself you can tweak kSeqLen and see if that helps.

austinvhuang avatar Feb 26 '24 15:02 austinvhuang