Georgi Gerganov issues

Results 136 issues of


                                            Georgi Gerganov

ggml : improve custom operator support

We have rudimentary support for adding F32 custom `ggml` operators in user space via: https://github.com/ggerganov/ggml/blob/a1d0ea7c2abd44f56822ffdfcfe0a0fcf7170885/include/ggml/ggml.h#L1173-L1240 The API has to be improved to support: - multi-threading - non-F32 types We also...

enhancement

good first issue

refactoring

ggml : get rid of BLAS and all it's variants

This is a big one The only reason we use BLAS is that we don't have efficient implementation of `matrix x matrix` multiplication. Naively doing parallel dot products is not...

refactoring

performance

ggml : add option for controlling work distribution across threads

See https://github.com/ggerganov/llama.cpp/pull/1507 And comment: https://github.com/ggerganov/llama.cpp/pull/1507#issuecomment-1605302021 > I guess we can extend ggml to be able to choose work chunk distribution method - either at compile time, or via a context...

refactoring

performance

ggml : add NUMA support

See https://github.com/ggerganov/llama.cpp/pull/1556 The PR not only adds NUMA, but also improves the threading logic in `ggml` which looks like brings significant speed-up. Needs to be verified for correctness before merging

enhancement

performance

ggml : deprecate `GGML_TASK_FINALIZE`

The initial design of the compute tasks in `ggml` was each one to have 3 separate stages: - `GGML_TASK_INIT` - `GGML_TASK_COMPUTE` - `GGML_TASK_FINALIZE` So far, the `GGML_TASK_FINALIZE` step has been...

good first issue

refactoring

performance

ggml : reduce the values of `ggml_op` enum

Certain `ggml` operators can be combined into a singe operator with extra arguments. For example: - `GGML_OP_CONV_1D_S1_PH` - `GGML_OP_CONV_1D_S2_PH` can all become: - `GGML_OP_CONV_1D` with a signature of: ```c GGML_API...

good first issue

refactoring

ggml : `ggml_graph_compute` should not require `ggml_context`

We now have the following signature: ```c GGML_API void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph); ``` The provided context is used during the computation to potentially allocate a...

good first issue

refactoring

ggml : generalize `quantize_fns` for simpler FP16 handling

This task is described well in https://github.com/ggerganov/llama.cpp/pull/1237 The WIP implementation in that PR might be a bit outdated by now, so one can either attempt to update it or implement...

good first issue

refactoring

llama : refactor model loading code

In `llama.cpp` we have logic for supporting some very old model formats and features such as sharded models which is making the code unnecessary complicated and difficult to maintain. We...

good first issue

refactoring

ggml : do not use _GNU_SOURCE gratuitously

See https://github.com/ggerganov/whisper.cpp/pull/1027