scikit-llm icon indicating copy to clipboard operation
scikit-llm copied to clipboard

Supporting local LLM api server and vLLM

Open authurlord opened this issue 1 year ago • 3 comments

Thanks for your great work! Since https://github.com/lm-sys/FastChat can initiate a local server on llama2/vicuna, which api is quite similar to openai, it is possible to support FastChat api server, so we can inference with a local api server?

Besides, is there any plan to support batch inference with https://github.com/vllm-project/vllm? The tabular data examples are similar, so batch inference with vLLM could speed up the whole process than gpt4all

authurlord avatar Aug 13 '23 05:08 authurlord

Hi @authurlord,

Thank you for your suggestion. To be honest I am not too familiar with FastChat, so will have to investigate it further. Regarding vLLM, most likely it will be supported in some form eventually, but so far we did not do any development in this direction.

OKUA1 avatar Aug 13 '23 10:08 OKUA1

Just add the possibility to change the openai_base and it will work

bacoco avatar Oct 30 '23 11:10 bacoco

@bacoco yes, this is the most straight-forward solution which we will do for sure.

We were thinking about some sort of deeper integration though, but did not make much progress yet.

OKUA1 avatar Nov 07 '23 19:11 OKUA1

Resolved with https://github.com/iryna-kondr/scikit-llm/pull/94

iryna-kondr avatar May 24 '24 14:05 iryna-kondr