Georgi Gerganov

Results 136 issues of Georgi Gerganov

We have rudimentary support for adding F32 custom `ggml` operators in user space via: https://github.com/ggerganov/ggml/blob/a1d0ea7c2abd44f56822ffdfcfe0a0fcf7170885/include/ggml/ggml.h#L1173-L1240 The API has to be improved to support: - multi-threading - non-F32 types We also...

enhancement
good first issue
refactoring

This is a big one The only reason we use BLAS is that we don't have efficient implementation of `matrix x matrix` multiplication. Naively doing parallel dot products is not...

refactoring
performance

See https://github.com/ggerganov/llama.cpp/pull/1507 And comment: https://github.com/ggerganov/llama.cpp/pull/1507#issuecomment-1605302021 > I guess we can extend ggml to be able to choose work chunk distribution method - either at compile time, or via a context...

refactoring
performance

See https://github.com/ggerganov/llama.cpp/pull/1556 The PR not only adds NUMA, but also improves the threading logic in `ggml` which looks like brings significant speed-up. Needs to be verified for correctness before merging

enhancement
performance

The initial design of the compute tasks in `ggml` was each one to have 3 separate stages: - `GGML_TASK_INIT` - `GGML_TASK_COMPUTE` - `GGML_TASK_FINALIZE` So far, the `GGML_TASK_FINALIZE` step has been...

good first issue
refactoring
performance

Certain `ggml` operators can be combined into a singe operator with extra arguments. For example: - `GGML_OP_CONV_1D_S1_PH` - `GGML_OP_CONV_1D_S2_PH` can all become: - `GGML_OP_CONV_1D` with a signature of: ```c GGML_API...

good first issue
refactoring

We now have the following signature: ```c GGML_API void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph); ``` The provided context is used during the computation to potentially allocate a...

good first issue
refactoring

This task is described well in https://github.com/ggerganov/llama.cpp/pull/1237 The WIP implementation in that PR might be a bit outdated by now, so one can either attempt to update it or implement...

good first issue
refactoring

In `llama.cpp` we have logic for supporting some very old model formats and features such as sharded models which is making the code unnecessary complicated and difficult to maintain. We...

good first issue
refactoring

See https://github.com/ggerganov/whisper.cpp/pull/1027