llama.cpp
llama.cpp copied to clipboard
ggml_new_tensor_impl: not enough space in the context's memory pool
Heya! Friend showed this to me and I'm trying to get it to work myself on Windows 10. I've applied the changes as seen in #22 to get it to build (more specifically, I pulled in the new commits from etra0's fork, but the actual executable fails to run - printing this before segfaulting:
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 458853944, available 454395136)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 458870468, available 454395136)
I'm trying to use 7B on an i9-13900K (and I have about 30 gigs of memory free right now), and I've verified my hashes with a friend. Any ideas? Thanks!
Tried out #31 - it, uh, got farther: GGML_ASSERT: D:\code\c++\llama.cpp\ggml.c:9349: false
ok I did an upsie in that PR, initializing it that way apparently didn't zero'ed out the rest of the fields. I updated the branch, please test it again now!
ok I did an upsie in that PR, initializing it that way apparently didn't zero'ed out the rest of the fields. I updated the branch, please test it again now!
It started to expand the prompt, but with seemingly garbage data: Building a website can be done in 10 simple steps: ╨Ñ╤Ç╨╛╨╜╨╛╨╗╨╛╨│╨╕╤ÿ╨
Should be good on latest master - reopen if issue persists. Make sure to rebuild and regen the models after updating
Hey i was trying to run this on a RHEL 8 server with 32 cpu cores. and i am getting the same error. On my second query.
I am using GPT4All-J v1.3-groovy.
ggml_new_tensor_impl: not enough space in the context's memory pool
Hi @ggerganov @gjmulder I would appreciate some direction for this pls.
Getting the same issue on Apple M1 Pro with 16GB RAM when trying the example from:
https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain/blob/master/06.private-gpt4all-qa-pdf.ipynb
Using a relatively large PDF with ~200 pages
Stack trace:
gpt_tokenize: unknown token '?' ggml_new_tensor_impl: not enough space in the context's memory pool (needed 16118890208, available 16072355200) [1] 62734 segmentation fault python3 /opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Same issue when running on Win11 with 64GB RAM (25 GB utilized):
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 450887680, available 446693376) Traceback (most recent call last): File "C:\AI\oobabooga_windows_GPU\text-generation-webui\modules\callbacks.py", line 55, in gentask ret = self.mfunc(callback=_callback, *args, **self.kwargs) File "C:\AI\oobabooga_windows_GPU\text-generation-webui\modules\llamacpp_model.py", line 92, in generate for completion_chunk in completion_chunks: File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 891, in _create_completion for token in self.generate( File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 713, in generate self.eval(tokens) File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 453, in eval return_code = llama_cpp.llama_eval( File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama_cpp.py", line 612, in llama_eval return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads) OSError: exception: access violation reading 0x0000000000000028 Output generated in 39.00 seconds (0.00 tokens/s, 0 tokens, context 5200, seed 1177762893)
Same issue when running on Win11 with 64GB RAM (25 GB utilized): [snip]
Oh hey, exact same error:
ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 452859040, available 446693376)
Same issue here, tried a combination of settings but just keep getting the memory error even though both RAM and GPU RAM are less than 50% utilization.
I had to follow the guide here to build llama-cpp with GPU support as it wasn't working previously, but even before that it was giving the same error (side note GPU support natively does work in oobabooga windows!?):
https://github.com/abetlen/llama-cpp-python/issues/182
Anyone have any ideas?
HW: Intel i9-10900K OC @5.3GHz 64GB DDR4-2400 / PC4-19200 12GB Nvidia GeForce RTX 3060
Using embedded DuckDB with persistence: data will be stored in: db ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6 llama.cpp: loading model from models/llama7b/llama-deus-7b-v3.ggmlv3.q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_head_kv = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 1 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 0.08 MB llama_model_load_internal: using CUDA for GPU acceleration llama_model_load_internal: mem required = 2927.79 MB (+ 1024.00 MB per state) llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 384 MB VRAM for the scratch buffer llama_model_load_internal: offloading 10 repeating layers to GPU llama_model_load_internal: offloaded 10/35 layers to GPU llama_model_load_internal: total VRAM used: 1470 MB llama_new_context_with_model: kv self size = 1024.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
What would you like to know about the policies?
test
ggml_new_object: not enough space in the context's memory pool (needed 10882896, available 10650320)
Traceback (most recent call last):
File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 84, in
Same here... any solutions already???
Solved this by going back to llama-cpp-python version 0.1.74
Solved this by going back to llama-cpp-python version 0.1.74
well this has nothing to do with python
Same here... any solutions already???
@dereklll This issue was closed 6 months ago, I'd suggest to create a new one.
Same issue on a runpod gpu machine, tried 2 different gpu's