FastChat
FastChat copied to clipboard
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Since Phi-3 mini did so well on the leaderboard, it would be interesting to see where the new small and medium models land. With Phi-3 vision, it also seems like...
During the streaming output process, does FastChat consider stopping the token output after the frontend disconnects using `AbortController` ? If so, could you please let me know where the code...
I spawn an openai compatible server using the following docker-compose: ``` version: "3" services: fastchat-controller: build: context: . dockerfile: Dockerfile image: fastchat:latest ports: - "21001:21001" entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host",...
When I run ```shell deepspeed fastchat/train/train_lora.py --model_name_or_path /root/autodl-tmp/cjk/Fast-Chat-main/Codellama-7B --lora_r 16 --lora_alpha 16 --lora_dropout 0.05 --data_path /root/autodl-tmp/cjk/Fast-Chat-main/data/Tool_ReAct_train_bird_Qshot.json --bf16 True --output_dir ./checkpoints --num_train_epochs 8 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --evaluation_strategy "no"...
## Why are these changes needed? vLLM expose a parameter called max_model_len. This will not match context_len when vllm startup if user update max_model_len. ## Related issue number (if applicable)...
python -m fastchat.serve.vllm_worker --model-path "/home/incar/newdata2/tms/llm/chatglm3-6b-32k" --trust-remote-code ``` Traceback (most recent call last): File "/home/incar/miniconda3/envs/chatlain/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/incar/miniconda3/envs/chatlain/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals)...
## Why are these changes needed? Fix GritLM: - Chat template is now correct. - Embedding now works correctly, values match the GritLM README example exactly. - In order to...
Currently, I'm using fastchat==0.2.36 and vllm==0.4.3 to deploy Qwen model for inference service. Here's the command for starting the service on my two servers. server1: `python3.9 -m fastchat.serve.vllm_worker --model-path /Qwen2-AWQ...
## Why are these changes needed? The simple changes add GPT-4o support to the llm-judge. As GPT-4o is much faster & much cheaper compared to GPT-4 (+ also better in...
安装命令 pip install git+https://github.com/lm-sys/FastChat.git 执行命令:python3 -m fastchat.serve.cli --model-path ./model/glm-4-9b-chat/ 输入:你好等 输出:不停地输出 装的最新版本,这自适应的glm-4是不是还是有bug 