dalai
dalai copied to clipboard
Segmentation fault Windows 11 Docker
I tried installing dalai with docker on windows. Currently I am getting the following error when I try generating a response with debug mode on:
root@7788cdbedf9c:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
>
> ### Instruction:
> >PROMPT
>
> ### Response:
> "
main: seed = 1679656530
llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
Segmentation fault
root@7788cdbedf9c:~/dalai/alpaca# exit
exit
Looking at the llama.cpp project it seems that they have tried to fix some segmentation problems but where unsuccessful. Perhaps this is the issue I am facing but I do not know. https://github.com/ggerganov/llama.cpp/commit/3cd8dde0d1357b7f11bdd25c45d5bf5e97e284a0
Any tips on how to debug this or to get a better error would be appreciated.
I do have the exact same problem. I tried running it in the terminal via docker and clonning alpaca.cpp and run make chat but without success. If I know something I would post here
Just downloaded the repo and installed the 30B model, having the same issue. Here's the debug output:
`/root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Instruction:
PROMPT
Response:
" exit root@81743ba9c2e2:~/dalai/alpaca# /root/dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/30B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Instruction:
PROMPT
Response:
" main: seed = 1680109480 llama_model_load: loading model from 'models/30B/ggml-model-q4_0.bin' - please wait ... llama_model_load: ggml ctx size = 25631.50 MB Segmentation fault root@81743ba9c2e2:~/dalai/alpaca# exit exit`
I also have this issue with alpaca 30B and llama 30B, exactly the same error (but the ggml ctx size
size is about 21000MB for me)
I have 32Go RAM, docker seems to consume a lot of it sometimes (via the vmmem
process) and so I sometimes don't have the 22go needed, but when I have enough RAM I still can't run the model...
So I bet 32Go of ram in not enough for running the 30B model using docker ? :thinking: How much do you have ?
Here i described my experience running models on Windows 10 https://github.com/cocktailpeanut/dalai/issues/330#issuecomment-1493062415
I have the assumption that the issue comes from the fact that this models requires a lot of RAM in your machine. Can anybody confirm or dismiss this? I believe when the model is loaded it is loaded in the RAM, that is the reason it breaks.
In my case the context size was causing this issue, I fixed it by adding new config to the UI which allows me to play with context size.
I was using 6 gb ram server to try it, in my case context size below 1024 seems to work without any errors.
PR for the same: #424