FastChat
FastChat copied to clipboard
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
I am running this on CPU, and I see that if I provide a lot of input the time it processes this with all available CPU cores takes longer than...
Thank you so much for your fantastic work. I meet a small problem and really hope you can help me. After building up OpenAI API, I try to send 'logprobs=1'...
To accelerate evaluation, I want to generate with multiple prompts rather than only one prompt. But I got following CUDA error. Can someone help this? **error**: 131 ../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block:...
Hello I tried to install fastchat with this command `pip3 install fschat` But I didn't succeed because when I execute my python script ``` #!/usr/bin/python3.10 import fschat model = fschat.load_model("lmsys/fastchat-t5-3b-v1.0")...

when I use the api_server, I found that it runs too slowly. With the same hardware and environment, web_server is 3-5 times faster. What is the reason for this and...
Hi! I use a single gpu A100(40G). ``` export NCCL_IB_DISABLE=1; export NCCL_P2P_DISABLE=1; export NCCL_DEBUG=INFO; export NCCL_SOCKET_IFNAME=en,eth,em,bond; export CXX=g++; deepspeed --num_gpus 1 --num_nodes 1 \ fastchat/train/train_mem.py \ --model_name_or_path ../hf-llama-7B \ --data_path...
Hello everyone, can we perform Lora or other fine-tuning based on the Vicuna model? How much graphics memory is required?
I have a kubernetes deployment. The model worker runs on a separate node and the controller/api-server run on a different pod. But the model worker keeps on registering and in...
I start the fastchat controller by default configuration: `python3 -m fastchat.serve.controller` However, when i registered the model_worker, it failed at the assert code which checks whether the status_code equals to...