gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

GPT4All API Scaffolding. Matches OpenAI OpenAPI spec for chats and completions

Open AndriyMulyar opened this issue 1 year ago • 2 comments

Describe your changes

Scaffolding for the GPT4All API that runs in a docker container with the OpenAI OpenAPI specification found here: This should follow https://github.com/openai/openai-openapi/blob/master/openapi.yaml

Checklist before requesting a review

  • [ ] I have performed a self-review of my code.
  • [ ] If it is a core feature, I have added thorough tests.
  • [ ] I have added thorough documentation for my code.
  • [x] I have tagged PR with relevant project labels. I acknowledge that a PR without labels may be dismissed.
  • [x] If this PR addresses a bug, I have provided both a screenshot/video of the original bug and the working solution.

Demo

See README.md of gpt4all-api

https://localhost:4891/docs image

Notes

Intended to fully match the OpenAI OpenAI spec for engines, completions, chat completions and embeddings.

This can eventually support GPU inference by side-caring a triton inference server.

AndriyMulyar avatar Jun 05 '23 01:06 AndriyMulyar

Not sure if this is the place for comments. Please let me know if I can help to test things out.

I noticed that your route for chat completion is /v1/chat/completion (singular), but the openai-openapi spec has /chat/completions (plural)

Ref: openai-openapi spec L:212

daneroo avatar Jun 05 '23 06:06 daneroo

i won't get back to dev'ing on this for a day or two, my m2 mac is giving me a hassle so need to move over to a linux machine

AndriyMulyar avatar Jun 05 '23 15:06 AndriyMulyar

Not sure if this is the place for comments. Please let me know if I can help to test things out.

I noticed that your route for chat completion is /v1/chat/completion (singular), but the openai-openapi spec has /chat/completions (plural)

Ref: openai-openapi spec L:212

addressed

AndriyMulyar avatar Jun 06 '23 02:06 AndriyMulyar

Is there any way to contribute to this MR or body of work in a meaningful way? Would love to help out, but don't want to get in the way.

TheDropZone avatar Jun 20 '23 21:06 TheDropZone

Is there any way to contribute to this MR or body of work in a meaningful way? Would love to help out, but don't want to get in the way.

yes! i've been short on time to continue this PR but if you can contribute please do. I have uncommited local changes that I need to push up and will follow up with a checklist response on things that need to be implemented so this can be more collaborative.

AndriyMulyar avatar Jun 20 '23 21:06 AndriyMulyar

Amazing! Will watch for that, and see if there is anything in the checklist that I could take a stab at. Thanks!

TheDropZone avatar Jun 20 '23 22:06 TheDropZone

I removed the merge commit from the top of your branch so it only contains your additions now - looks good so far

may not be intended to be part of this initial PR but while I'm looking at it, reloading the model on each call should not be that hard to avoid - if you keep it it around you can prevent using any previous state in the model object to so its safe to reuse across calls by passing n_past=0 to generate

mind that a given model object should only be used by one request at once, not concurrently - simple solution is just to prevent concurrent use with an asyncio.Lock to make concurrent requests wait

if you want to serve concurrent requests you'll need multiple instances, for which you might just use the same kind of approach (maybe even the same off the shelf library) you'd use for a database connection pool here - which also is a convenient way to limit how many are instantiated at any given time to keep under a memory budget, especially if only one model type is supported at a time

LOVE the idea of a connection-pool-esque model pool!

TheDropZone avatar Jun 29 '23 21:06 TheDropZone