bmwl
bmwl
@ggerganov I'm trying to create a branch that improves NUMA performance, but am having trouble adapting the existing CPU memory buffer allocation code due to lack of familiarity with the...
@slaren Pinging you for help. I've been at this for over two months now, and have failed in my attempts to even force through a dirty mechanism just to prove...
> If you have any questions just ask. For llama.cpp and other projects using ggml-backend, the memory of the tensors is allocated in a `ggml_backend_buffer`, typically in `ggml_backend_cpu_buffer_type_alloc_buffer` but there...
> That's not really feasible because all the threads work on the same tensors, just different slices of it. Does that mean that my current approach of attempting to ensure...
> Is it possible to allocate a contiguous amount of memory and assign different slices of it to different NUMA nodes? As far as I am aware, no. Allocating on...
> Buffer types (`ggml_backend_buffer_type`) are mainly used as a way to allocate different types of buffers in a generic way. So you can use the same interface to allocate a...
This should be fixed now solution PR 5557: https://github.com/ggerganov/llama.cpp/compare/master...bmtwl:llama.cpp:master Please let me know if the issue persists
@dspasyuk Hmmm. your system should be able to use the stable syscall wrapper since your glib is above 2.28. If you change ggml.c line 2153 to: `#if __GLIBC__ > 2...
As far as I know MPI has been broken for over a year and needs extensive work to get it going again. I believe the recent work on the RPC...