Georgi Gerganov
Georgi Gerganov
We have rudimentary support for adding F32 custom `ggml` operators in user space via: https://github.com/ggerganov/ggml/blob/a1d0ea7c2abd44f56822ffdfcfe0a0fcf7170885/include/ggml/ggml.h#L1173-L1240 The API has to be improved to support: - multi-threading - non-F32 types We also...
This is a big one The only reason we use BLAS is that we don't have efficient implementation of `matrix x matrix` multiplication. Naively doing parallel dot products is not...
See https://github.com/ggerganov/llama.cpp/pull/1507 And comment: https://github.com/ggerganov/llama.cpp/pull/1507#issuecomment-1605302021 > I guess we can extend ggml to be able to choose work chunk distribution method - either at compile time, or via a context...
See https://github.com/ggerganov/llama.cpp/pull/1556 The PR not only adds NUMA, but also improves the threading logic in `ggml` which looks like brings significant speed-up. Needs to be verified for correctness before merging
The initial design of the compute tasks in `ggml` was each one to have 3 separate stages: - `GGML_TASK_INIT` - `GGML_TASK_COMPUTE` - `GGML_TASK_FINALIZE` So far, the `GGML_TASK_FINALIZE` step has been...
Certain `ggml` operators can be combined into a singe operator with extra arguments. For example: - `GGML_OP_CONV_1D_S1_PH` - `GGML_OP_CONV_1D_S2_PH` can all become: - `GGML_OP_CONV_1D` with a signature of: ```c GGML_API...
We now have the following signature: ```c GGML_API void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph); ``` The provided context is used during the computation to potentially allocate a...
This task is described well in https://github.com/ggerganov/llama.cpp/pull/1237 The WIP implementation in that PR might be a bit outdated by now, so one can either attempt to update it or implement...
In `llama.cpp` we have logic for supporting some very old model formats and features such as sharded models which is making the code unnecessary complicated and difficult to maintain. We...
See https://github.com/ggerganov/whisper.cpp/pull/1027