private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

ISSUE: Out of memory/ no condition to apply max memory safety usage limit.

Open d2rgaming-9000 opened this issue 1 year ago • 2 comments

Python is using the whole memory, instead of passing everything under a 'MAX' limit, which makes the program crash abruptly.

Here is what I get when I run the command ingest.py,

llama.cpp: loading model from /home.. (long syntax)
..
..
//(TO NOTE*: this one loads successfully)
...
gptj_model_load: loading model from '/home/.... (long syntax)
..
...
gptj_model_load: ......................: Job 1, 'python privateGPT.py' terminated by signal SIGKILL (Forced quit)
//(NOTE*: it stops abruptly at loading gptj model, after it reaches maximum memory usage)

Journalctl gives me more information about the error when looking at the logs,

 Out of memory: Killed process 35854 (python) total-vm:12425720kB, anon-rss:581472kB, file-rss:1436kB, shmem-rss:0kB, UID:1000 pgtables:13984kB oom_score_adj:200

vte-spawn-da1d5e81-f9b0-4a5a-96e4-1f8b8ecc8532.scope: A process of this unit has been killed by the OOM killer.

[email protected]: A process of this unit has been killed by the OOM killer.

Are there any options to pass to specifically tell it to manually use an X amount of memory for the tasks, in the same command?

Or should I write another Python program to handle this kind of issue, and manually set a maximum limit (though I am not sure if this would even work, as there seems to be various hooks, and other processes spawning which would probably not be sufficiently controlled, as a result, and would still crash).

I wish there was a way to set a limit, or fix this issue of reaching maximum memory usage, as I am using this to test it on my laptop.

I am thinking about implementing something like this:

# importing libraries
import signal
import resource
import os
  
# checking time limit exceed
def time_exceeded(signo, frame):
    print("Time's up !")
    raise SystemExit(1)
  
def set_max_runtime(seconds):
    # setting up the resource limit
    soft, hard = resource.getrlimit(resource.RLIMIT_CPU)
    resource.setrlimit(resource.RLIMIT_CPU, (seconds, hard))
    signal.signal(signal.SIGXCPU, time_exceeded)
  
# max run time of 15 millisecond
if __name__ == '__main__':
    set_max_runtime(15)
    while True:
        pass

to see if that solves the problem for the time being (I highly doubt it, unless I can control all the spawn'ers and the other processes that come up with it).

d2rgaming-9000 avatar May 14 '23 22:05 d2rgaming-9000

also see #104

They seem to be getting them occasionally, however I am getting it everytime.

d2rgaming-9000 avatar May 14 '23 23:05 d2rgaming-9000

Can the context size n_ctx be reduced to 1024 from 2048?

gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 4505.45 MB
gptj_model_load: memory_size =   896.00 MB, n_mem = 57344
gptj_model_load: ................................... done
gptj_model_load: model size =  3609.38 MB / num tensors = 285```


I just updated the ctx value in `.env` to 1024 but it still returns the above showing 2048: 

```PERSIST_DIRECTORY=db
MODEL_TYPE=GPT4All
MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1024```

hatgit avatar May 23 '23 23:05 hatgit