FastChat
FastChat copied to clipboard
Token usage field support for the ChatGPT-compatible Restful API
I am pleased to submit a pull request to support the token usage field in the ChatGPT-compatible Restful API. The "chat_completion" function in "api.py" has been modified to return a dictionary that includes usage information, rather than returning a string. Similarly, the "generate_stream" function has been changed to yield a dictionary instead of a string, in order to obtain token usage information during the generation process. Besides, I added the "worker_generate" interface to the model_worker to support non-streaming results retrieval.
It seems there are some overlaps between this PR and #663. Could you try to review and comment on #663?
PR #663 adds the APIs of completions and embeddings. I saw that it did not make changes to the chat_completion API. I think there is no non-overlap with adding usage field for API chat_completion in my PR.
@merrymercy I noticed that there are now some conflicts in the streaming support with the Pull Request #858. This PR includes some features that are not yet supported by #858, including stop list and finish_reason "length", for which we have added comments to PR #858. I would greatly appreciate it if you could take some time to review the pull request and merge my branch and PR #858.
@jstzwj Sure! We would like to merge this PR. #858 has been merged first because streaming is requested by many users.
#858 did some refactoring (e.g., fastchat/serve/api_server.py → fastchat/serve/openai_api_server.py
, fastchat/protocol/chat_completion.py → fastchat/protocol/openai_api_protocol.py
). Could you rebase? I will review your PR soon after your rebase.
Thank you for your response, we will rebase the PR from branch main soon.
@merrymercy I think we have finished implementing usage field and fixed the compatibility with OpenAI's API. Could you please review our changes and merge them into the main branch? Thank you for your attention.
@andy-yang-1 Could you also do a round of review?
@jstzwj This PR is getting better with more features! Let me know when it is ready for review.
I think I have completed the new feature implementations and it is ready for a review.
Nice work! @jstzwj I have taken the time to run the code, and overall, I believe it functions well without any major issues. I would like to suggest some improvements related to the accompanying documentation and tests, including:
@andy-yang-1 Thank you for taking the time to review my code and provide feedback. I made the suggested changes to the accompanying tests and updated the readme to provide more detailed information. Once again, thank you for your review and for helping me make code better.
Let's give some time before merging for possible comments.
OK, given there are no more comments, let me merge this PR. Thanks a lot for this contribution!
@jstzwj This PR breaks the CLI inference (e.g., python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0
) so I reverted it first. Could you send a new PR and let me do a rigorous test before merging?
@suquark We do not have unit test/CI right now, but I do have a comprehensive test workflow locally. I can document more so we can be more careful when merging this kind of big refactor in the future. In short, we need to test the following three functionalities with multiple models:
- CLI inference
- Gradio web server
- OpenAI API server