alpaca.cpp Improve ALPACA speed using GPU

Improve ALPACA speed using GPU

Open multimediaconverter opened this issue 1 year ago • 2 comments

Take a look at this project:

https://github.com/Const-me/Whisper

It is a Windows port of the ggerganov's whisper.cpp implementation using DirectCompute -- another name for that technology is "compute shaders in Direct3D 11".

Author claims that it shouldn’t be hard to support another ML model with the compute shaders and relevant infrastructure already implemented in this project.

I suppose that this library may improve speed of ALPACA chat significantly.

Mar 24 '23 11:03 multimediaconverter

I am not an expert, but I suspect that even though the GPU is faster, but for this, the weights of the neural networks must fit in the GPU memory. The NVIDIA RTX 4090 has 24 GB, so the 30b model will not fit there, but 13b works well on the CPU

Mar 25 '23 11:03 openMolNike

add a parameter to allow the user to choose to use GPU or CPU... even the directml XD would be the best choice.

Apr 01 '23 15:04 fenixlam

alpaca.cpp alpaca.cpp copied to clipboard

Improve ALPACA speed using GPU

alpaca.cpp
alpaca.cpp copied to clipboard