WSL support: "Killed", once executed on WSL
WSL subsystem anything seems normal, once executed, only "Killed" print-out.
$ ./gemma --tokenizer /home/home/F/GPT/Gemma/2b-it-cpp/tokenizer.spm \ > --compressed_weights /home/home/F/GPT/Gemma/2b-it-cpp/2b-it.sbs \ > --model 2b-it
$ Killed
ENV: clang-10.0 ubuntu18.04 && ubuntu20.04 cmake
Looks like your machine is unable to support the 2B model.
Hi @Code-keys , can you try with 2b-it-sfp.sbs? SFP uses compressed 8bit weights. also make sure to use the right build (if you use 2b-it vs. 2b-it-sfp there's a different set of build settings) - we've updated the README. Can you share any information about your hardware settings (RAM, etc.)?
i will have a try
I get this with the 7b model but not 2b. For me it's a memory limit issue. It seems to be killed when it runs out of RAM (8gb on my laptop) to store the 7b model.
A better error message would be helpful.
using the newest projects is fine, model (2b-pt)
The precision of 2b-it is nice, while the 2b-pt seems like a shit, so why?
The precision of 2b-it is nice, while the 2b-pt seems like a shit, so why?
Because 2b-it is instruction tuned. 2b-pt is not. You're not comparing the same type of model.
The rootcase is OOM, :100:
Re memory issues: I'm going to adjust some defaults + make kSeqLen configurable that should improve the situation a bit.
In configs.h, a key parameter kSeqLen which preallocates a kv cache ~
kSeqLen x kLayers x kKVHeads x kQKVDim x 2 x sizeof(float)
Short term remediation:
- Smaller default size
- Have it be configurable at compile time
Medium term remediation:
- Small initial preallocation and dynamically resize as needed.
- Better OOM checking + error messages
PR should be coming today on the short term remediation, if you want to play with it yourself you can tweak kSeqLen and see if that helps.