alpaca.cpp icon indicating copy to clipboard operation
alpaca.cpp copied to clipboard

Improve ALPACA speed using GPU

Open multimediaconverter opened this issue 1 year ago • 2 comments

Take a look at this project:

https://github.com/Const-me/Whisper

It is a Windows port of the ggerganov's whisper.cpp implementation using DirectCompute -- another name for that technology is "compute shaders in Direct3D 11".

Author claims that it shouldn’t be hard to support another ML model with the compute shaders and relevant infrastructure already implemented in this project.

I suppose that this library may improve speed of ALPACA chat significantly.

multimediaconverter avatar Mar 24 '23 11:03 multimediaconverter

I am not an expert, but I suspect that even though the GPU is faster, but for this, the weights of the neural networks must fit in the GPU memory. The NVIDIA RTX 4090 has 24 GB, so the 30b model will not fit there, but 13b works well on the CPU

openMolNike avatar Mar 25 '23 11:03 openMolNike

add a parameter to allow the user to choose to use GPU or CPU... even the directml XD would be the best choice.

fenixlam avatar Apr 01 '23 15:04 fenixlam