localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

Process is Killed on CPU

Open Anas-Dew opened this issue 1 year ago • 14 comments

[localGPT] main*% [2d,16h,12m] →

$ python3 ingest.py --device_type cpu Loading documents from /home/ni-user/Desktop/localGPT/SOURCE_DOCUMENTS Loaded 1 documents from /home/ni-user/Desktop/localGPT/SOURCE_DOCUMENTS Split into 148 chunks of text load INSTRUCTOR_Transformer Killed [localGPT] main*% [2d,16h,12m] →

$ python3 run_localGPT.py --device_type cpu Running on: cpu load INSTRUCTOR_Transformer Killed [localGPT] main*% [2d,16h,13m] →

Anas-Dew avatar Jun 01 '23 16:06 Anas-Dew

same problem - no solution yet

achillez avatar Jun 01 '23 22:06 achillez

@Anas-Dew @achillez Can you share your hardware configuration and memory utilization while the code is running?

PromtEngineer avatar Jun 02 '23 04:06 PromtEngineer

Sure it's a i7 11700K (8 phy core, 16 virt core), 32GB physical mem (16GB in WSL). GTX 1660 Ti 6GB mem. Not sure GPU config matters since I see the exact same issue in CPU mode.

achillez avatar Jun 02 '23 06:06 achillez

It was 3gb ram and single core cpu I guess, I was running it on cloud computer. (neverinstall)

Anas-Dew avatar Jun 02 '23 06:06 Anas-Dew

why did you close this Anas? It's still an issue no?

On Thu, Jun 1, 2023 at 11:45 PM Anas Raza @.***> wrote:

Closed #45 https://github.com/PromtEngineer/localGPT/issues/45 as completed.

— Reply to this email directly, view it on GitHub https://github.com/PromtEngineer/localGPT/issues/45#event-9410818785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7D6KKMMZF2B43VXHYKZTTXJGDXHANCNFSM6AAAAAAYXGRSEM . You are receiving this because you were mentioned.Message ID: @.***>

achillez avatar Jun 02 '23 07:06 achillez

I do this usually when trying out new things.

Anas-Dew avatar Jun 03 '23 16:06 Anas-Dew

Here's the python trace of the error. Appears it fails somewhere in torch

init.py(101): if nonlinearity in linear_fns or nonlinearity == 'sigmoid': init.py(103): elif nonlinearity == 'tanh': init.py(105): elif nonlinearity == 'relu': init.py(107): elif nonlinearity == 'leaky_relu': init.py(108): if param is None: init.py(110): elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float): init.py(112): negative_slope = param init.py(115): return math.sqrt(2.0 / (1 + negative_slope ** 2)) init.py(409): std = gain / math.sqrt(fan) init.py(410): bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation init.py(411): with torch.no_grad(): --- modulename: grad_mode, funcname: init grad_mode.py(49): if not torch._jit_internal.is_scripting(): --- modulename: _jit_internal, funcname: is_scripting _jit_internal.py(1121): return False grad_mode.py(50): super().init() grad_mode.py(51): self.prev = False --- modulename: grad_mode, funcname: enter grad_mode.py(54): self.prev = torch.is_grad_enabled() grad_mode.py(55): torch.set_grad_enabled(False) --- modulename: grad_mode, funcname: init grad_mode.py(150): self.prev = torch.is_grad_enabled() grad_mode.py(151): torch._C.set_grad_enabled(mode) grad_mode.py(152): self.mode = mode init.py(412): return tensor.uniform(-bound, bound) Killed

achillez avatar Jun 03 '23 20:06 achillez

According to my machine, the program takes up so much memory that my 16 gigabytes of ram overflows. (My computer freezes for a second) The problem you are facing to could be somewhat similar to mine

orhnk avatar Jun 04 '23 02:06 orhnk

According to my machine, the program takes up so much memory that my 16 gigabytes of ram overflows. (My computer freezes for a second) The problem you are facing to could be somewhat similar to mine

This is exactly the problem. I increased it to 24GB but still it eats away memory, chews through the swap file (8GB) and then kills itself.

How much memory does this take? We really need a smaller model. 16GB is pretty common on desktops.

achillez avatar Jun 04 '23 04:06 achillez

same here!

Peixer avatar Jun 05 '23 01:06 Peixer

I'm having the same issue when running on CPU

$ python run_localGPT.py --device_type cpu

Running on: cpu
load INSTRUCTOR_Transformer
max_seq_length  512
Using embedded DuckDB with persistence: data will be stored in: ....
Killed

naorsabag avatar Jun 05 '23 07:06 naorsabag

Fixed this by increasing the swap file size. Appears you need around 40GB of memory (mem+swap) to avoid the app crashing. Now it runs and asks for a prompt. However, I can't get it to respond with a legitimate answer.

Not sure why - trying a rerun with ingest + run_LocalGPT

achillez avatar Jun 05 '23 23:06 achillez

@achillez what command did you use to set the swap file size?

Peixer avatar Jun 05 '23 23:06 Peixer

For wsl you have to edit your .wslconfig file. Search online for the options. https://learn.microsoft.com/en-us/windows/wsl/wsl-config

For default linux there's a few steps to take to unlink your swapfile, change the size etc..

achillez avatar Jun 06 '23 00:06 achillez