FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

OpenAI-ish API with batch generation

Open nielstron opened this issue 1 year ago • 9 comments

This is a draft PR that serves more informational purpose for anyone wanting to interact with the model in an OpenAI-API-ish fashion. I will not implement additional features upon request but it serves as a useful basis for any usage

Run the models and controller as usual, then start

python3 -m fastchat.serve.api

This will host an API that takes various parameters. Features include

  • batch generation (exploiting batching of the transformer model, i.e. higher throughput)
  • setting the seed
  • setting stop strings
  • choose a device (i.e. "cuda" or "cpu" for workers)

nielstron avatar Apr 12 '23 15:04 nielstron

This looks cool. Recently we did some refactoring and released a new version of weights. Could you please rebase and follow this to construct the prompt? https://github.com/lm-sys/FastChat/blob/00e432aab031becf311c33d3ecf2bd92a122ccb4/fastchat/serve/test_message.py#L30-L33

The generate_stream is moved to https://github.com/lm-sys/FastChat/blob/00e432aab031becf311c33d3ecf2bd92a122ccb4/fastchat/serve/inference.py#L58-L59

merrymercy avatar Apr 12 '23 19:04 merrymercy

Hi, I am also doing similar stuff but with FastAPI as the server backend. The focus is on implementing a working and openai-compatible API rapidly, w/o actual optimization like batching, or support all parameters in openai's API

Would you like to actually implement this API, or this is more like a proposal? I can yield the work to you if you would like to implement a working API soon. Thanks.

suquark avatar Apr 13 '23 01:04 suquark

What do you mean with FastAPI? Is this a different repository?

As I noted above, I have built this for a different project and just wanted to point others that might need similar features here. Also if you are building something similar feel free to pick bits from this implementation that are useful (I.e. the batching)

nielstron avatar Apr 13 '23 06:04 nielstron

Oh, FastAPI is just another popular web framework, similar to Flask. Currently we use it for serving with Gradio.

Thank you for showing how some similar features can be implemented in this PR.

suquark avatar Apr 13 '23 06:04 suquark

@suquark, I'm also looking for an openai compatible API. https://github.com/hyperonym/basaran/ does exactly this, but might be best for speed, efficiency and maintenance to implement such API interface directly within FastChat. Are you planning to create a pull request?

Thank you @nielstron for your hard work! Will give this a try.

Thireus avatar Apr 13 '23 21:04 Thireus

@Thireus I am planning to create a pull request very soon for this, but w/o optimization + only limited features support. I would keep the initial PR as simple as possible, so it will be easy to be extended. Also feel free if you would like to contribute to extend the API!

FYI: #426

suquark avatar Apr 14 '23 04:04 suquark

@merrymercy I merged main into this branch

nielstron avatar Apr 14 '23 11:04 nielstron

@suquark @merrymercy the support for batching is quite generic and independent of the api - if you are interested I can make this a separate PR

nielstron avatar Apr 14 '23 14:04 nielstron

Disclaimer: I just found out that this batch generation copes very badly with OOM errors, so I can not recommend its usage.

nielstron avatar Apr 15 '23 08:04 nielstron

closing this since the api seems to be merged :)

nielstron avatar Apr 18 '23 12:04 nielstron