FastChat
FastChat copied to clipboard
OpenAI-ish API with batch generation
This is a draft PR that serves more informational purpose for anyone wanting to interact with the model in an OpenAI-API-ish fashion. I will not implement additional features upon request but it serves as a useful basis for any usage
Run the models and controller as usual, then start
python3 -m fastchat.serve.api
This will host an API that takes various parameters. Features include
- batch generation (exploiting batching of the transformer model, i.e. higher throughput)
- setting the seed
- setting stop strings
- choose a device (i.e. "cuda" or "cpu" for workers)
This looks cool. Recently we did some refactoring and released a new version of weights. Could you please rebase and follow this to construct the prompt? https://github.com/lm-sys/FastChat/blob/00e432aab031becf311c33d3ecf2bd92a122ccb4/fastchat/serve/test_message.py#L30-L33
The generate_stream
is moved to https://github.com/lm-sys/FastChat/blob/00e432aab031becf311c33d3ecf2bd92a122ccb4/fastchat/serve/inference.py#L58-L59
Hi, I am also doing similar stuff but with FastAPI as the server backend. The focus is on implementing a working and openai-compatible API rapidly, w/o actual optimization like batching, or support all parameters in openai's API
Would you like to actually implement this API, or this is more like a proposal? I can yield the work to you if you would like to implement a working API soon. Thanks.
What do you mean with FastAPI? Is this a different repository?
As I noted above, I have built this for a different project and just wanted to point others that might need similar features here. Also if you are building something similar feel free to pick bits from this implementation that are useful (I.e. the batching)
Oh, FastAPI is just another popular web framework, similar to Flask. Currently we use it for serving with Gradio.
Thank you for showing how some similar features can be implemented in this PR.
@suquark, I'm also looking for an openai compatible API. https://github.com/hyperonym/basaran/ does exactly this, but might be best for speed, efficiency and maintenance to implement such API interface directly within FastChat. Are you planning to create a pull request?
Thank you @nielstron for your hard work! Will give this a try.
@Thireus I am planning to create a pull request very soon for this, but w/o optimization + only limited features support. I would keep the initial PR as simple as possible, so it will be easy to be extended. Also feel free if you would like to contribute to extend the API!
FYI: #426
@merrymercy I merged main into this branch
@suquark @merrymercy the support for batching is quite generic and independent of the api - if you are interested I can make this a separate PR
Disclaimer: I just found out that this batch generation copes very badly with OOM errors, so I can not recommend its usage.
closing this since the api seems to be merged :)