How open-inference-protocol works in LLMs, any use case?
With the increasing popularity of LLMs, many companies have started to look into deploying LLMs.
Instead of infer/predict, completions and embeddings are being used. Most of the API supports stream.
Example API spec:
I would like to check if there is any use cases in the community using open-inference-protocol on LLMs andis there a road map to natively support or extend open-inference-protocol to have better support in LLMs?
The good thing about open-inference-protocol is it standardizes the way people interact with models, and it is very useful when it comes to developing a transformer (pre/post-processing) and integrating different transformers and predictors into an inference graph. A standard protocol also makes it easy to develop a serving runtime that supports different kind of LLMs.
Thanks.
Thanks @lizzzcai ! We just added this agenda to today’s US community meeting
Another example of an API from HuggingFace's Text Generation Inference server: https://huggingface.github.io/text-generation-inference/
captured some of my thoughts on the API Spec discussed here.
However, I want to bring up a topic on the API Spec. Currently the schema is following to HF. However, from my experience of playing around with LLM and some of the third-party toolkit building on top of LLM/ChatGPT so far, I feel like openAI spec will be the better option as most of the third-party LLM applications are supporting it out of the box, which means a better ecosystem and user experience. If user deployed a LLM model in KServe however the API cannot be used in the LLM toolkit, it will be hard to promote it.