h2ogpt icon indicating copy to clipboard operation
h2ogpt copied to clipboard

Rest API for inference locally

Open mohamed-alired opened this issue 10 months ago • 5 comments

hi  I have installed h2ogpt locally, but I want to build a frontend app using it, so I was wondering if there's an API that I can consume, like one for ingestion and another for inference. 

mohamed-alired avatar Apr 15 '24 20:04 mohamed-alired

An extensive gradio API exists, see: See readme_client.md and examples via test code like test_client_chat_stream_langchain_steps3

And a full chat OpenAI API that is REST capable exists, but no upload of file or other things exists yet. Is that what you are looking for?

pseudotensor avatar Apr 15 '24 21:04 pseudotensor

What I am looking for is a fastapi rest API for the different ingestion techniques and a rag completion API so I can use H2OGPT as a backend rag for my frontend webUI. Also, I wish you included JSON metadata for filtering in ingestion and rag completion so we can choose the files to chat with.

mohamed-alired avatar Apr 15 '24 22:04 mohamed-alired

hi @mohamed-alired

Am currently building something exactly like this. its still in development tho. U can certainly fork the repo or make PR's. the foundation is there. The project extends the official FastAPI Template so scalling and deploying wont really much of a husle.

check it out here: https://github.com/abuyusif01/h2ogpt-fast-api/tree/main/backend/app/h2ogpt

there's still alot things need to be done. Including a proper README and support Streaming the Response (I planed to get this done in this weekend)

Here is what we currently support:

  1. Chat with on disk files (there's an endpoint to upload docs, and retrieve whats being uploaded, so u can select which doc to ingest)
  2. Chat with user Created pipelines (Currently MongoDB streamed data)
  3. Chat with Urls
  4. Chat with Publications, We use OpenDoaj API and scihub to download the papers.

abuyusif01 avatar Apr 17 '24 04:04 abuyusif01

hi @abuyusif01 how are you? i am really busy so if i have some time i will definitely PR bit i can give you some recommendations like don't force the inference with users cause i may wanna use it on my existing project also i think you have to make it possible with local inference like llamaCpp or something else so it's completely locally

mohamed-alired avatar Apr 23 '24 18:04 mohamed-alired

@mohamed-alired You're right we dont really need to enforce auth, hence its removal I also make it possible to local inference using llamaCPP.

Subsequently, i restructure the repo, write a readme and containerize the app. Its now easy to setup + extend check it here: https://github.com/abuyusif01/h2ogpt-fast-api

@pseudotensor Since gradio is relatively stable now, why not reference this in the readme. so other people can use it as a starting point.

abuyusif01 avatar Apr 28 '24 02:04 abuyusif01