FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Results 766 FastChat issues
Sort by recently updated
recently updated
newest added
trafficstars

I tried to infer my data using `get_model_answer.py` with A100-80g, but each query took over 30 seconds to infererence. However, when I deployed the model with openai-api on the same...

Hello, I'm trying to deploy a server on an AWS machine and test the performances of the model mentioned in the title. I've launched the model worker with the following...

I've added support for the `revision` parameter in `load_model` and `load_compress_model`. It explicitly defaults to `"main"`, which is also the default in Huggingface `from_pretrained` methods. I believe all of the...

Give prompt: ``[['Human', 'Hello! What is your name?'], ['Assistant', None]]``, the ``count_token`` api will returns 2 which is the history length instead of token count. See the following screenshots: ![img_v2_02f3d397-496b-4a35-a7da-95a7c14eaefg](https://github.com/lm-sys/FastChat/assets/103977926/d43a2388-9314-4745-8ef5-1771293d99d4)...

parameters is : torchrun --nproc_per_node=1 --master_port=20001 FastChat/fastchat/train/train_mem.py --model_name_or_path /home/wanghaikuan/vicuna-7b --data_path /home/wanghaikuan/chat/playground_data_dummy.json --bf16 False --output_dir output --num_train_epochs 3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit...

In model_worker.py, line # 102-103 ``` if hasattr(self.model.config, "max_sequence_length"): self.context_len = self.model.config.max_sequence_length ``` Should it be ``` if hasattr(self.model.config, "max_seq_len"): self.context_len = self.model.config.max_seq_len ``` to get the correct max sequence...

>>> python3 -m fastchat.serve.model_worker --model-name 'RWKV-4' --model-path BlinkDL/RWKV-4-Raven/RWKV-4-Raven-7B-v10x-Eng49%-Chn50%-Other1%-20230423-ctx4096 --gpus 2 --host **** --worker-address http://****** --controller-address http://***** error: ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Currently, FastChat use float16 instead of bfloat16 on Guanaco model, which is different from https://github.com/artidoro/qlora. I'm wondering what influence this difference will cause. Thanks.

This error was displayed during training:

Add support for Falcon. @merrymercy ## Why are these changes needed? ### for falcon inference. We've created a new stream generation file, using the Transformers' generate function as a basis....

new-model