scikit-llm Supporting local LLM api server and vLLM

Supporting local LLM api server and vLLM

Open authurlord opened this issue 1 year ago • 3 comments

Thanks for your great work! Since https://github.com/lm-sys/FastChat can initiate a local server on llama2/vicuna, which api is quite similar to openai, it is possible to support FastChat api server, so we can inference with a local api server?

Besides, is there any plan to support batch inference with https://github.com/vllm-project/vllm? The tabular data examples are similar, so batch inference with vLLM could speed up the whole process than gpt4all

Aug 13 '23 05:08 authurlord

Hi @authurlord,

Thank you for your suggestion. To be honest I am not too familiar with FastChat, so will have to investigate it further. Regarding vLLM, most likely it will be supported in some form eventually, but so far we did not do any development in this direction.

Aug 13 '23 10:08 OKUA1

Just add the possibility to change the openai_base and it will work

Oct 30 '23 11:10 bacoco

@bacoco yes, this is the most straight-forward solution which we will do for sure.

We were thinking about some sort of deeper integration though, but did not make much progress yet.

Nov 07 '23 19:11 OKUA1

Resolved with https://github.com/iryna-kondr/scikit-llm/pull/94

May 24 '24 14:05 iryna-kondr

scikit-llm scikit-llm copied to clipboard

Supporting local LLM api server and vLLM

scikit-llm
scikit-llm copied to clipboard