dalai icon indicating copy to clipboard operation
dalai copied to clipboard

Segmentation fault Windows 11 Docker

Open jak6jak opened this issue 1 year ago • 6 comments

I tried installing dalai with docker on windows. Currently I am getting the following error when I try generating a response with debug mode on:

root@7788cdbedf9c:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
> 
> ### Instruction:
> >PROMPT
> 
> ### Response:
> "
main: seed = 1679656530
llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
Segmentation fault
root@7788cdbedf9c:~/dalai/alpaca# exit
exit

Looking at the llama.cpp project it seems that they have tried to fix some segmentation problems but where unsuccessful. Perhaps this is the issue I am facing but I do not know. https://github.com/ggerganov/llama.cpp/commit/3cd8dde0d1357b7f11bdd25c45d5bf5e97e284a0

Any tips on how to debug this or to get a better error would be appreciated.

jak6jak avatar Mar 24 '23 11:03 jak6jak

I do have the exact same problem. I tried running it in the terminal via docker and clonning alpaca.cpp and run make chat but without success. If I know something I would post here

christopherorea avatar Mar 24 '23 17:03 christopherorea

Just downloaded the repo and installed the 30B model, having the same issue. Here's the debug output:

`/root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction:

PROMPT

Response:

" exit root@81743ba9c2e2:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

Instruction:

PROMPT

Response:

" main: seed = 1680109480 llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ... llama_model_load: ggml ctx size = 25631.50 MB Segmentation fault root@81743ba9c2e2:~/dalai/alpaca# exit exit`

FrancescoGrazioso avatar Mar 29 '23 17:03 FrancescoGrazioso

I also have this issue with alpaca 30B and llama 30B, exactly the same error (but the ggml ctx size size is about 21000MB for me)

I have 32Go RAM, docker seems to consume a lot of it sometimes (via the vmmem process) and so I sometimes don't have the 22go needed, but when I have enough RAM I still can't run the model...

So I bet 32Go of ram in not enough for running the 30B model using docker ? :thinking: How much do you have ?

glozachmeur avatar Mar 30 '23 09:03 glozachmeur

Here i described my experience running models on Windows 10 https://github.com/cocktailpeanut/dalai/issues/330#issuecomment-1493062415

toolchild avatar Apr 01 '23 19:04 toolchild

I have the assumption that the issue comes from the fact that this models requires a lot of RAM in your machine. Can anybody confirm or dismiss this? I believe when the model is loaded it is loaded in the RAM, that is the reason it breaks.

christopherorea avatar Apr 05 '23 11:04 christopherorea

In my case the context size was causing this issue, I fixed it by adding new config to the UI which allows me to play with context size.

I was using 6 gb ram server to try it, in my case context size below 1024 seems to work without any errors.

PR for the same: #424

pratyushtiwary avatar Apr 26 '23 16:04 pratyushtiwary