FastChat
FastChat copied to clipboard
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
I tried to infer my data using `get_model_answer.py` with A100-80g, but each query took over 30 seconds to infererence. However, when I deployed the model with openai-api on the same...
Hello, I'm trying to deploy a server on an AWS machine and test the performances of the model mentioned in the title. I've launched the model worker with the following...
I've added support for the `revision` parameter in `load_model` and `load_compress_model`. It explicitly defaults to `"main"`, which is also the default in Huggingface `from_pretrained` methods. I believe all of the...
Give prompt: ``[['Human', 'Hello! What is your name?'], ['Assistant', None]]``, the ``count_token`` api will returns 2 which is the history length instead of token count. See the following screenshots: ...
parameters is : torchrun --nproc_per_node=1 --master_port=20001 FastChat/fastchat/train/train_mem.py --model_name_or_path /home/wanghaikuan/vicuna-7b --data_path /home/wanghaikuan/chat/playground_data_dummy.json --bf16 False --output_dir output --num_train_epochs 3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit...
In model_worker.py, line # 102-103 ``` if hasattr(self.model.config, "max_sequence_length"): self.context_len = self.model.config.max_sequence_length ``` Should it be ``` if hasattr(self.model.config, "max_seq_len"): self.context_len = self.model.config.max_seq_len ``` to get the correct max sequence...
>>> python3 -m fastchat.serve.model_worker --model-name 'RWKV-4' --model-path BlinkDL/RWKV-4-Raven/RWKV-4-Raven-7B-v10x-Eng49%-Chn50%-Other1%-20230423-ctx4096 --gpus 2 --host **** --worker-address http://****** --controller-address http://***** error: ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Currently, FastChat use float16 instead of bfloat16 on Guanaco model, which is different from https://github.com/artidoro/qlora. I'm wondering what influence this difference will cause. Thanks.
This error was displayed during training:
Add support for Falcon. @merrymercy ## Why are these changes needed? ### for falcon inference. We've created a new stream generation file, using the Transformers' generate function as a basis....