llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

ggml_new_tensor_impl: not enough space in the context's memory pool

Open NotNite opened this issue 1 year ago • 3 comments

Heya! Friend showed this to me and I'm trying to get it to work myself on Windows 10. I've applied the changes as seen in #22 to get it to build (more specifically, I pulled in the new commits from etra0's fork, but the actual executable fails to run - printing this before segfaulting:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 458853944, available 454395136)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 458870468, available 454395136)

I'm trying to use 7B on an i9-13900K (and I have about 30 gigs of memory free right now), and I've verified my hashes with a friend. Any ideas? Thanks!

NotNite avatar Mar 12 '23 01:03 NotNite

Tried out #31 - it, uh, got farther: GGML_ASSERT: D:\code\c++\llama.cpp\ggml.c:9349: false

NotNite avatar Mar 12 '23 04:03 NotNite

ok I did an upsie in that PR, initializing it that way apparently didn't zero'ed out the rest of the fields. I updated the branch, please test it again now!

etra0 avatar Mar 12 '23 05:03 etra0

ok I did an upsie in that PR, initializing it that way apparently didn't zero'ed out the rest of the fields. I updated the branch, please test it again now!

It started to expand the prompt, but with seemingly garbage data: Building a website can be done in 10 simple steps: ╨Ñ╤Ç╨╛╨╜╨╛╨╗╨╛╨│╨╕╤ÿ╨

NotNite avatar Mar 12 '23 06:03 NotNite

Should be good on latest master - reopen if issue persists. Make sure to rebuild and regen the models after updating

ggerganov avatar Mar 13 '23 17:03 ggerganov

Hey i was trying to run this on a RHEL 8 server with 32 cpu cores. and i am getting the same error. On my second query.

I am using GPT4All-J v1.3-groovy.

ggml_new_tensor_impl: not enough space in the context's memory pool

eshaanagarwal avatar Jun 12 '23 10:06 eshaanagarwal

Hi @ggerganov @gjmulder I would appreciate some direction for this pls.

eshaanagarwal avatar Jun 13 '23 07:06 eshaanagarwal

Getting the same issue on Apple M1 Pro with 16GB RAM when trying the example from:

https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain/blob/master/06.private-gpt4all-qa-pdf.ipynb

Using a relatively large PDF with ~200 pages

Stack trace:

gpt_tokenize: unknown token '?' ggml_new_tensor_impl: not enough space in the context's memory pool (needed 16118890208, available 16072355200) [1] 62734 segmentation fault python3 /opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

superbsky avatar Jun 18 '23 01:06 superbsky

Same issue when running on Win11 with 64GB RAM (25 GB utilized):

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 450887680, available 446693376) Traceback (most recent call last): File "C:\AI\oobabooga_windows_GPU\text-generation-webui\modules\callbacks.py", line 55, in gentask ret = self.mfunc(callback=_callback, *args, **self.kwargs) File "C:\AI\oobabooga_windows_GPU\text-generation-webui\modules\llamacpp_model.py", line 92, in generate for completion_chunk in completion_chunks: File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 891, in _create_completion for token in self.generate( File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 713, in generate self.eval(tokens) File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 453, in eval return_code = llama_cpp.llama_eval( File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama_cpp.py", line 612, in llama_eval return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads) OSError: exception: access violation reading 0x0000000000000028 Output generated in 39.00 seconds (0.00 tokens/s, 0 tokens, context 5200, seed 1177762893)

dzupin avatar Jul 18 '23 13:07 dzupin

Same issue when running on Win11 with 64GB RAM (25 GB utilized): [snip]

Oh hey, exact same error:

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 452859040, available 446693376)

LoganDark avatar Jul 25 '23 01:07 LoganDark

Same issue here, tried a combination of settings but just keep getting the memory error even though both RAM and GPU RAM are less than 50% utilization.

I had to follow the guide here to build llama-cpp with GPU support as it wasn't working previously, but even before that it was giving the same error (side note GPU support natively does work in oobabooga windows!?):
https://github.com/abetlen/llama-cpp-python/issues/182

Anyone have any ideas?

HW: Intel i9-10900K OC @5.3GHz 64GB DDR4-2400 / PC4-19200 12GB Nvidia GeForce RTX 3060

Using embedded DuckDB with persistence: data will be stored in: db ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6 llama.cpp: loading model from models/llama7b/llama-deus-7b-v3.ggmlv3.q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_head_kv = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 1 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 0.08 MB llama_model_load_internal: using CUDA for GPU acceleration llama_model_load_internal: mem required = 2927.79 MB (+ 1024.00 MB per state) llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 384 MB VRAM for the scratch buffer llama_model_load_internal: offloading 10 repeating layers to GPU llama_model_load_internal: offloaded 10/35 layers to GPU llama_model_load_internal: total VRAM used: 1470 MB llama_new_context_with_model: kv self size = 1024.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

What would you like to know about the policies?

test

ggml_new_object: not enough space in the context's memory pool (needed 10882896, available 10650320) Traceback (most recent call last): File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 84, in main() File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 55, in main res = qa(query) File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call raise e File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call self._call(inputs, run_manager=run_manager) File "C:\Program Files\Python310\lib\site-packages\langchain\chains\retrieval_qa\base.py", line 133, in _call answer = self.combine_documents_chain.run( File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 441, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call raise e File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call self._call(inputs, run_manager=run_manager) File "C:\Program Files\Python310\lib\site-packages\langchain\chains\combine_documents\base.py", line 106, in _call output, extra_return_dict = self.combine_docs( File "C:\Program Files\Python310\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 165, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 252, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call raise e File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call self._call(inputs, run_manager=run_manager) File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 92, in _call response = self.generate([inputs], run_manager=run_manager) File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 102, in generate return self.llm.generate_prompt( File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 188, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 281, in generate output = self._generate_helper( File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 225, in _generate_helper raise e File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 212, in _generate_helper self._generate( File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 604, in _generate self._call(prompt, stop=stop, run_manager=run_manager, **kwargs) File "C:\Program Files\Python310\lib\site-packages\langchain\llms\llamacpp.py", line 229, in _call for token in self.stream(prompt=prompt, stop=stop, run_manager=run_manager): File "C:\Program Files\Python310\lib\site-packages\langchain\llms\llamacpp.py", line 279, in stream for chunk in result: File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 899, in _create_completion for token in self.generate( File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 721, in generate self.eval(tokens) File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 461, in eval return_code = llama_cpp.llama_eval( File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama_cpp.py", line 678, in llama_eval return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads) OSError: exception: access violation reading 0x0000000000000000

omarelanis avatar Jul 26 '23 15:07 omarelanis

Same here... any solutions already???

jiapei100 avatar Aug 07 '23 21:08 jiapei100

Solved this by going back to llama-cpp-python version 0.1.74

sherrmann avatar Sep 02 '23 22:09 sherrmann

Solved this by going back to llama-cpp-python version 0.1.74

well this has nothing to do with python

LoganDark avatar Sep 02 '23 22:09 LoganDark

Same here... any solutions already???

dereklll avatar Sep 12 '23 05:09 dereklll

@dereklll This issue was closed 6 months ago, I'd suggest to create a new one.

sozforex avatar Sep 12 '23 12:09 sozforex

Same issue on a runpod gpu machine, tried 2 different gpu's

dillfrescott avatar Nov 15 '23 05:11 dillfrescott