apcameron
apcameron
Please add OpenCL Support that so that it can be used on GPU's that Support OpenCL and not CUDA
Please add support for Riscv-v based systems
Is it possible to provide an API the mimics the functionality of the OPENAI API?
### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md). - [X] I searched using keywords...
Please add support for the latest Meta Models https://ai.meta.com/blog/meta-llama-3-1/
When I run generation_inference.py I get the error below. RuntimeError: FlashAttention only supports Ampere GPUs or newer. Please add an option to either disable it.
Please add support for Pascal Based Gpu's This used to work in the older versions of flash-attention
Please consider enabling the Use of pytorch's scaled_dot_product_attention as an alternative for those with older GPU's See this example for another product. https://github.com/HiDream-ai/HiDream-I1/pull/27