FastChat issues

Why inference so slow with get_model_answer.py?

I tried to infer my data using `get_model_answer.py` with A100-80g, but each query took over 30 seconds to infererence. However, when I deployed the model with openai-api on the same...

realgump

Error with nomic-ai/gpt4all-13b-snoozy model

14

Hello, I'm trying to deploy a server on an AWS machine and test the performances of the model mentioned in the title. I've launched the model worker with the following...

hyunkelw

Support specifying `revision` in `load_model`

I've added support for the `revision` parameter in `load_model` and `load_compress_model`. It explicitly defaults to `"main"`, which is also the default in Huggingface `from_pretrained` methods. I believe all of the...

jaywonchung

count_token seems return incorrect values for ChatGLM model

Give prompt: ``[['Human', 'Hello! What is your name?'], ['Assistant', None]]``, the ``count_token`` api will returns 2 which is the history length instead of token count. See the following screenshots: ![img_v2_02f3d397-496b-4a35-a7da-95a7c14eaefg](https://github.com/lm-sys/FastChat/assets/103977926/d43a2388-9314-4745-8ef5-1771293d99d4)...

codingfun2022

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

14

parameters is ： torchrun --nproc_per_node=1 --master_port=20001 FastChat/fastchat/train/train_mem.py --model_name_or_path /home/wanghaikuan/vicuna-7b --data_path /home/wanghaikuan/chat/playground_data_dummy.json --bf16 False --output_dir output --num_train_epochs 3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1200 --save_total_limit...

whk6688

Get max_sequence_length from model's config.json

In model_worker.py, line # 102-103 ``` if hasattr(self.model.config, "max_sequence_length"): self.context_len = self.model.config.max_sequence_length ``` Should it be ``` if hasattr(self.model.config, "max_seq_len"): self.context_len = self.model.config.max_seq_len ``` to get the correct max sequence...

chingheng113

BlinkDL/rwkv-4-raven error: ConnectionError

>>> python3 -m fastchat.serve.model_worker --model-name 'RWKV-4' --model-path BlinkDL/RWKV-4-Raven/RWKV-4-Raven-7B-v10x-Eng49%-Chn50%-Other1%-20230423-ctx4096 --gpus 2 --host **** --worker-address http://****** --controller-address http://***** error: ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

eeric

ericzhou571

new-model

FastChat
FastChat copied to clipboard

Metadata

Why inference so slow with get_model_answer.py?

Error with nomic-ai/gpt4all-13b-snoozy model

Support specifying `revision` in `load_model`

count_token seems return incorrect values for ChatGLM model

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Get max_sequence_length from model's config.json

BlinkDL/rwkv-4-raven error: ConnectionError

bfloat16 on GuanacoAdapter

13b Fine tuning error

Add Support For Falcon

← Metadata

Owner

Metadata

FastChat FastChat copied to clipboard

Metadata

← Metadata

Owner

Metadata

FastChat
FastChat copied to clipboard