llama.cpp ggml : add GPU support for Mamba models

ggml : add GPU support for Mamba models

Open ggerganov opened this issue 1 year ago • 0 comments

Recently, initial Mamba support (CPU-only) has been introduced in #5328 by @compilade

In order to support running these models efficiently on the GPU, we seem to be lacking kernel implementations for the following 2 ops:

GGML_OP_SSM_CONV
GGML_OP_SSM_SCAN

Creating this issue to keep track of this and give more visibility of this feature. Help with implementing the missing kernels for CUDA and Metal (and other backends potentially) is welcome. We can also discuss if anything else is required to better support this architecture in llama.cpp

Apr 19 '24 06:04 ggerganov

llama.cpp llama.cpp copied to clipboard

ggml : add GPU support for Mamba models

llama.cpp
llama.cpp copied to clipboard