FastChat Token usage field support for the ChatGPT-compatible Restful API

I am pleased to submit a pull request to support the token usage field in the ChatGPT-compatible Restful API. The "chat_completion" function in "api.py" has been modified to return a dictionary that includes usage information, rather than returning a string. Similarly, the "generate_stream" function has been changed to yield a dictionary instead of a string, in order to obtain token usage information during the generation process. Besides, I added the "worker_generate" interface to the model_worker to support non-streaming results retrieval.

May 04 '23 05:05 jstzwj

It seems there are some overlaps between this PR and #663. Could you try to review and comment on #663?

May 05 '23 10:05 merrymercy

PR #663 adds the APIs of completions and embeddings. I saw that it did not make changes to the chat_completion API. I think there is no non-overlap with adding usage field for API chat_completion in my PR.

May 05 '23 12:05 jstzwj

@merrymercy I noticed that there are now some conflicts in the streaming support with the Pull Request #858. This PR includes some features that are not yet supported by #858, including stop list and finish_reason "length", for which we have added comments to PR #858. I would greatly appreciate it if you could take some time to review the pull request and merge my branch and PR #858.

May 08 '23 12:05 jstzwj

@jstzwj Sure! We would like to merge this PR. #858 has been merged first because streaming is requested by many users. #858 did some refactoring (e.g., fastchat/serve/api_server.py → fastchat/serve/openai_api_server.py, fastchat/protocol/chat_completion.py → fastchat/protocol/openai_api_protocol.py). Could you rebase? I will review your PR soon after your rebase.

May 08 '23 13:05 merrymercy

Thank you for your response, we will rebase the PR from branch main soon.

May 08 '23 13:05 jstzwj

@merrymercy I think we have finished implementing usage field and fixed the compatibility with OpenAI's API. Could you please review our changes and merge them into the main branch? Thank you for your attention.

May 08 '23 18:05 jstzwj

@andy-yang-1 Could you also do a round of review?

May 09 '23 03:05 merrymercy

@jstzwj This PR is getting better with more features! Let me know when it is ready for review.

May 09 '23 03:05 merrymercy

I think I have completed the new feature implementations and it is ready for a review.

May 09 '23 07:05 jstzwj

Nice work! @jstzwj I have taken the time to run the code, and overall, I believe it functions well without any major issues. I would like to suggest some improvements related to the accompanying documentation and tests, including:

May 09 '23 09:05 andy-yang-1

@andy-yang-1 Thank you for taking the time to review my code and provide feedback. I made the suggested changes to the accompanying tests and updated the readme to provide more detailed information. Once again, thank you for your review and for helping me make code better.

May 09 '23 16:05 jstzwj

Let's give some time before merging for possible comments.

May 09 '23 18:05 suquark

OK, given there are no more comments, let me merge this PR. Thanks a lot for this contribution!

May 10 '23 06:05 suquark

@jstzwj This PR breaks the CLI inference (e.g., python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0) so I reverted it first. Could you send a new PR and let me do a rigorous test before merging? @suquark We do not have unit test/CI right now, but I do have a comprehensive test workflow locally. I can document more so we can be more careful when merging this kind of big refactor in the future. In short, we need to test the following three functionalities with multiple models:

CLI inference
Gradio web server
OpenAI API server

May 10 '23 11:05 merrymercy

FastChat FastChat copied to clipboard

Token usage field support for the ChatGPT-compatible Restful API

FastChat
FastChat copied to clipboard