mediapipe
mediapipe copied to clipboard
LLM inference - Openai Chat Api or Llama compatible chat params
Describe the feature and the current behaviour/state
Does GenAi Llm inference support openai chat api or support taking parameters similar to Llama models? If not then I am requesting implementing the api similar to Openai Chat completion.
Please specify the use cases for this feature
For example having a stop param to stop the generation of the llm inference would be great. And Llama compatible chat format can be useful to integrate with other frameworks such as Langchain.
Any Other info
There is a mention of Llama in a comment in llm.h although I am not sure what this is about.