private-gpt
private-gpt copied to clipboard
ISSUE: Out of memory/ no condition to apply max memory safety usage limit.
Python is using the whole memory, instead of passing everything under a 'MAX' limit, which makes the program crash abruptly.
Here is what I get when I run the command ingest.py
,
llama.cpp: loading model from /home.. (long syntax)
..
..
//(TO NOTE*: this one loads successfully)
...
gptj_model_load: loading model from '/home/.... (long syntax)
..
...
gptj_model_load: ......................: Job 1, 'python privateGPT.py' terminated by signal SIGKILL (Forced quit)
//(NOTE*: it stops abruptly at loading gptj model, after it reaches maximum memory usage)
Journalctl gives me more information about the error when looking at the logs,
Out of memory: Killed process 35854 (python) total-vm:12425720kB, anon-rss:581472kB, file-rss:1436kB, shmem-rss:0kB, UID:1000 pgtables:13984kB oom_score_adj:200
vte-spawn-da1d5e81-f9b0-4a5a-96e4-1f8b8ecc8532.scope: A process of this unit has been killed by the OOM killer.
[email protected]: A process of this unit has been killed by the OOM killer.
Are there any options to pass to specifically tell it to manually use an X amount of memory for the tasks, in the same command?
Or should I write another Python program to handle this kind of issue, and manually set a maximum limit (though I am not sure if this would even work, as there seems to be various hooks, and other processes spawning which would probably not be sufficiently controlled, as a result, and would still crash).
I wish there was a way to set a limit, or fix this issue of reaching maximum memory usage, as I am using this to test it on my laptop.
I am thinking about implementing something like this:
# importing libraries
import signal
import resource
import os
# checking time limit exceed
def time_exceeded(signo, frame):
print("Time's up !")
raise SystemExit(1)
def set_max_runtime(seconds):
# setting up the resource limit
soft, hard = resource.getrlimit(resource.RLIMIT_CPU)
resource.setrlimit(resource.RLIMIT_CPU, (seconds, hard))
signal.signal(signal.SIGXCPU, time_exceeded)
# max run time of 15 millisecond
if __name__ == '__main__':
set_max_runtime(15)
while True:
pass
to see if that solves the problem for the time being (I highly doubt it, unless I can control all the spawn'ers and the other processes that come up with it).
also see #104
They seem to be getting them occasionally, however I am getting it everytime.
Can the context size n_ctx
be reduced to 1024 from 2048?
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size = 896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size = 3609.38 MB / num tensors = 285```
I just updated the ctx value in `.env` to 1024 but it still returns the above showing 2048:
```PERSIST_DIRECTORY=db
MODEL_TYPE=GPT4All
MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1024```