h2ogpt
h2ogpt copied to clipboard
Rest API for inference locally
hi I have installed h2ogpt locally, but I want to build a frontend app using it, so I was wondering if there's an API that I can consume, like one for ingestion and another for inference.
An extensive gradio API exists, see: See readme_client.md and examples via test code like test_client_chat_stream_langchain_steps3
And a full chat OpenAI API that is REST capable exists, but no upload of file or other things exists yet. Is that what you are looking for?
What I am looking for is a fastapi rest API for the different ingestion techniques and a rag completion API so I can use H2OGPT as a backend rag for my frontend webUI. Also, I wish you included JSON metadata for filtering in ingestion and rag completion so we can choose the files to chat with.
hi @mohamed-alired
Am currently building something exactly like this. its still in development tho. U can certainly fork the repo or make PR's. the foundation is there. The project extends the official FastAPI Template so scalling and deploying wont really much of a husle.
check it out here: https://github.com/abuyusif01/h2ogpt-fast-api/tree/main/backend/app/h2ogpt
there's still alot things need to be done. Including a proper README and support Streaming the Response (I planed to get this done in this weekend)
Here is what we currently support:
- Chat with on disk files (there's an endpoint to upload docs, and retrieve whats being uploaded, so u can select which doc to ingest)
- Chat with user Created pipelines (Currently MongoDB streamed data)
- Chat with Urls
- Chat with Publications, We use OpenDoaj API and scihub to download the papers.
hi @abuyusif01 how are you? i am really busy so if i have some time i will definitely PR bit i can give you some recommendations like don't force the inference with users cause i may wanna use it on my existing project also i think you have to make it possible with local inference like llamaCpp or something else so it's completely locally
@mohamed-alired You're right we dont really need to enforce auth, hence its removal I also make it possible to local inference using llamaCPP.
Subsequently, i restructure the repo, write a readme and containerize the app. Its now easy to setup + extend check it here: https://github.com/abuyusif01/h2ogpt-fast-api
@pseudotensor Since gradio is relatively stable now, why not reference this in the readme. so other people can use it as a starting point.