FastChat
FastChat copied to clipboard
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Hi, I deployed the model backend using fastchat worker. I tested the throughput for llama3-8b model and it reached >2500 tokens/seconds on A100. However when I started using the same...
## Why are these changes needed? - We need to know why people are coming to the arena also what sucks about it so we can make it all better...
## Why are these changes needed? This PR addresses the issue of duplicate `GeminiAdapter` class definitions found in the codebase. The changes aim to merge the two identical class definitions,...
I've noticed that there are two identical class definitions for `GeminiAdapter` in the same file. This appears to be an unintended duplication. - File: `fastchat/model/model_adapter.py` - Lines: 1202-1212 and 2193-2205...
ascend NPU how to Multiple NPUs
npu: 910B * 8 model : baichuan-13B torch: 2.1.0 torch_npu: 2.1.0 fastchat: 0.2.36 transformers: 4.43.3 i use the command "python3 -m fastchat.serve.cli --model-path baichuan-13b/ --device npu" to run fastchat serve,...
We are a GPU company, and we would like to support our GPU backend for FastChat, so we would like to submit a PR (Pull Request), but we do not...
I've manually added a conversation template and a model_adaptor template of Codestral:22b-v0.1, but when I call the model, it may be unable to stop response and have poor performance, but...