GPT-2 segfaults when used through the CLI
Trying any GPT-2 GGML model through the CLI appears to cause an immediate segfault:
llama-rs # cargo run --bin llm gpt2 infer -m models/gpt2/cerebras-2.7b-q4_0.bin -p "Now, this is a story all about how"
[...]
[2023-05-01T23:43:17Z INFO llm::cli_args] Model fully loaded! Elapsed: 75ms
zsh: segmentation fault cargo run --bin llm gpt2 infer -m models/gpt2/cerebras-2.7b-q4_0.bin -p
This appears to be true regardless of the model (Cerebras and base GPT-2 seem to both suffer from this).
This doesn't happen when run through the GPT-2 example.
I wonder if this has to do w/ loading through the snapshot.
I am not able to reproduce this problem
llama-rs: ./target/release/llm gpt2 infer -m ~/.ggml-models/cerebras-gpt-13b.bin -p "Hello my name is"
[2023-05-03T17:55:57Z INFO llm::cli_args] ggml ctx size = 7857.04 MB
[2023-05-03T17:55:57Z INFO llm::cli_args] Loaded tensor 8/485
...
[2023-05-03T17:56:02Z INFO llm::cli_args] Loaded tensor 480/485
[2023-05-03T17:56:02Z INFO llm::cli_args] Loading of model complete
[2023-05-03T17:56:02Z INFO llm::cli_args] Model size = 0.00 MB / num tensors = 485
[2023-05-03T17:56:02Z INFO llm::cli_args] Model fully loaded! Elapsed: 5008ms
"Hello my name is 'Celest,' and you're looking for a guy named..." "Marius." ""I'm looking for Marius^C
How weird... is that q4 or f16?
q4? I'm not sure honestly 😅 I think I'm testing w/ this model that appears to have been taken down 🤷🏻 https://huggingface.co/mongolian-basket-weaving/cerebras-gpt-13b-ggml-q4_0
Is this wrong?
https://github.com/rustformers/llm/blob/be56c36/crates/models/gpt2/src/lib.rs#L314-L316
Ok, just tested with https://huggingface.co/xzuyn/GPT-2-124M-ggml-q4_1/blob/main/ggml-model-q4_1.bin on macOS:
# cargo run --bin llm gpt2 infer -m models/gpt2/GPT-2-124M-ggml-q4_1.bin -p "1 + 2 = "
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running `target/debug/llm gpt2 infer -m models/gpt2/GPT-2-124M-ggml-q4_1.bin -p '1 + 2 = '`
✓ Loaded 149 tensors (125.8 MB) after 153ms
zsh: segmentation fault cargo run --bin llm gpt2 infer -m models/gpt2/GPT-2-124M-ggml-q4_1.bin -p
Is this wrong?
https://github.com/rustformers/llm/blob/be56c36/crates/models/gpt2/src/lib.rs#L314-L316
Aha - I think you've figured it out...
Running with --num-ctx-tokens 1024 doesn't segfault for me. Our default of 2048 doesn't work for all models. Oops.
Or maybe not.
# cargo run --release --bin llm gpt2 infer -m models/gpt2/cerebras-2.7b-q4_1.bin -p "Fred looked at his hand and wondered: " --num-ctx-tokens 512
Finished release [optimized] target(s) in 0.08s
Running `target/release/llm gpt2 infer -m models/gpt2/cerebras-2.7b-q4_1.bin -p 'Fred looked at his hand and wondered: ' --num-ctx-tokens 512`
✓ Loaded 389 tensors (5.6 GB) after 91ms
zsh: segmentation fault cargo run --release --bin llm gpt2 infer -m models/gpt2/cerebras-2.7b-q4_1.bi
Quick findings with a debugger:
- Only seems to happen with the mmap'd model
- The segfault occurs here:
datais invalid https://github.com/ggerganov/ggml/blob/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/src/ggml.c#L9146 - The only place
get_rowsis used is here: https://github.com/rustformers/llm/blob/7c2edb13149ff78765134e97190fb1f80a2fa39d/crates/models/gpt2/src/lib.rs#L152-L153 - Thus, one of these two tensors is likely not loading correctly through mmap
- Using a sane context length and
--no-mmapseems to circumvent this for now
This is definitely something we should investigate and fix, but not a showstopper for now, I think.