FastChat OpenAI-ish API with batch generation

This is a draft PR that serves more informational purpose for anyone wanting to interact with the model in an OpenAI-API-ish fashion. I will not implement additional features upon request but it serves as a useful basis for any usage

Run the models and controller as usual, then start

python3 -m fastchat.serve.api

This will host an API that takes various parameters. Features include

batch generation (exploiting batching of the transformer model, i.e. higher throughput)
setting the seed
setting stop strings
choose a device (i.e. "cuda" or "cpu" for workers)

Apr 12 '23 15:04 nielstron

This looks cool. Recently we did some refactoring and released a new version of weights. Could you please rebase and follow this to construct the prompt? https://github.com/lm-sys/FastChat/blob/00e432aab031becf311c33d3ecf2bd92a122ccb4/fastchat/serve/test_message.py#L30-L33

The generate_stream is moved to https://github.com/lm-sys/FastChat/blob/00e432aab031becf311c33d3ecf2bd92a122ccb4/fastchat/serve/inference.py#L58-L59

Apr 12 '23 19:04 merrymercy

Hi, I am also doing similar stuff but with FastAPI as the server backend. The focus is on implementing a working and openai-compatible API rapidly, w/o actual optimization like batching, or support all parameters in openai's API

Would you like to actually implement this API, or this is more like a proposal? I can yield the work to you if you would like to implement a working API soon. Thanks.

Apr 13 '23 01:04 suquark

What do you mean with FastAPI? Is this a different repository?

As I noted above, I have built this for a different project and just wanted to point others that might need similar features here. Also if you are building something similar feel free to pick bits from this implementation that are useful (I.e. the batching)

Apr 13 '23 06:04 nielstron

Oh, FastAPI is just another popular web framework, similar to Flask. Currently we use it for serving with Gradio.

Thank you for showing how some similar features can be implemented in this PR.

Apr 13 '23 06:04 suquark

@suquark, I'm also looking for an openai compatible API. https://github.com/hyperonym/basaran/ does exactly this, but might be best for speed, efficiency and maintenance to implement such API interface directly within FastChat. Are you planning to create a pull request?

Thank you @nielstron for your hard work! Will give this a try.

Apr 13 '23 21:04 Thireus

@Thireus I am planning to create a pull request very soon for this, but w/o optimization + only limited features support. I would keep the initial PR as simple as possible, so it will be easy to be extended. Also feel free if you would like to contribute to extend the API!

FYI: #426

Apr 14 '23 04:04 suquark

@merrymercy I merged main into this branch

Apr 14 '23 11:04 nielstron

@suquark @merrymercy the support for batching is quite generic and independent of the api - if you are interested I can make this a separate PR

Apr 14 '23 14:04 nielstron

Disclaimer: I just found out that this batch generation copes very badly with OOM errors, so I can not recommend its usage.

Apr 15 '23 08:04 nielstron

closing this since the api seems to be merged :)

Apr 18 '23 12:04 nielstron

FastChat FastChat copied to clipboard

OpenAI-ish API with batch generation

FastChat
FastChat copied to clipboard