ggml Bug in CUDA backend when allocated tensors in a given buffer exceed GGML_CUDA_MAX

Bug in CUDA backend when allocated tensors in a given buffer exceed GGML_CUDA_MAX_NODES

Open YavorGIvanov opened this issue 1 year ago • 0 comments

I created a graph which had custom max graph size far bigger than the GGML_DEFAULT_GRAPH_SIZE. I used the new ggml_new_graph_custom(..) + ggml_graph_overhead_custom(..) API and the CPU backend worked, but the GPU backend started crashing at random places (cublas or copy kernel or other). The problem is that the cuda backend buffer context keeps only GGML_CUDA_MAX_NODES (8192) ggml tensor gpu extras and uses them in a ring buffer, so if we exceed the tensors needing an extra in a given buffer, they start reusing the same extras and thus overwriting each other.

This is hell to investigate or debug, so I recommend either resizing the extras ring buffer dynamically or at least adding an error or assert in ggml-alloc.c when this is detected. This may require adding a new backend method to query the max allowed nodes.

Dec 14 '23 09:12 YavorGIvanov

ggml ggml copied to clipboard

Bug in CUDA backend when allocated tensors in a given buffer exceed GGML_CUDA_MAX_NODES

ggml
ggml copied to clipboard