aici
aici copied to clipboard
[FEATURE] Add SYCL feature flags to rllm-llamacpp build (To add support for Intel GPUs)
The goal of this pull request is to add support for Intel GPU to the build script of rllm llama.cpp This is done by
- expanding the cargo feature flags of rllm-llamacpp and rllm-llamacpp_low and incoporate
- adding the needed build arguments in the main.rs from rllm-llamacpp_low
- and adding appropriate new options to server.sh
The code will be tested on a machine with Intel 13Gen Processor & Intel Arc A770 GPU
@microsoft-github-policy-service agree
The basic build works now. 🚀 Want to test the feature again if the llama.cpp version got bumped
Known Issues:
- llama.cpp hangs after loading model into VRAM on Intel ARC -> This is a bug of the current llama.cpp version and has to be tested again after an upgrade
- currently only FP32 build works -> This also could depend on the current version but further testing has to be done