lighteval
lighteval copied to clipboard
[FT] Support llama.cpp inference
Issue encountered
Currently, inference of open models on my Mac device is quite slow since vllm does not support mps.
Solution/Feature
Llama.cpp does support mps and would significantly speed up local evaluation of open models.
Posssible alternatives
Allowing the use of the mps device in other ways of loading models would also work.