HuggingFace Hub
Hello,
Thank you for your hard work.
I tried to run the code bench locally (on a RTX 3060 12Gb) but was hitting issues, I know however though that it is possible to use Hugging face hub inference, would this be of interest to setup a Hugging face hub runner? I may have some scope to contribute if that could be of use.
Hi, that sounds like a nice idea. Similarly, we might also want to support together or fireworks like API providers.
Yeah sounds like a good idea!
So thinking from a software point of view maybe an abstract API Runner class that maybe sets up the runner and client then implementations per api provider, I think the openai runner could move to this sort of framework potentially as well?
From a quick test I had to:
- Declare model style
- Implement runner class (I implemented run_batch not sure which methods are compulsory for tests to run?)
- Add it to runner utils
- Create prompt (this could be problematic here because the prompt depends on the model eg on HF you can run codellama / gemini / mistral ... which work best with their specific prompts, I wonder if having mixin classes that say hey I am using an API but also using this model family would work?
Thanks, yes this sounds reasonable. Adding an additional flag to choose between api provider or vllm sounds reasonable.
It would be great to have api support, thanks for your contributions.
Also, OpenRouter support is very appreciated. With OR we can choose almost any model.
Hello, I have added support for the OpenAI-style interface, through the http interface from the inference framework such as vllm, you can directly call to obtain the generated content. I can contribute the code if needed.
Hello, I have added support for the OpenAI-style interface, through the http interface from the inference framework such as vllm, you can directly call to obtain the generated content. I can contribute the code if needed.
Wow! It's very cool! PR please. @Naman-ntc please approve.