FastChat
FastChat copied to clipboard
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
If you have a model loaded from a json file, it will appear on the list of the available models on the web UI, but it won't appear on the...
## Why are these changes needed? ## Related issue number (if applicable) ## Checks - [ ] I've run `format.sh` to lint the changes in this PR. - [ ]...
In conversation.py (my comment): ``` elif self.sep_style == SeparatorStyle.LLAMA3: # No! It's already added in encode_dialog_prompt chat_format.py #ret = "" if self.system_message: ret += system_prompt ``` And in encode_dialog_prompt: ```...
When I use fastchat to integrate vllm, I get the error "TypeError: top_k must be an integer, got float". The reason is that vllm 0.5.5 has added a BUGFIX [#7227](https://github.com/vllm-project/vllm/pull/7227),...
when trying to load quantized models i always get ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by...
Hi team, I've a question related to generating model responses using GPTQ. I've compressed Llama-2-7B using basic AutoGPTQ using transformers. ``` from transformers import AutoModelForCausalLM, AutoTokenizer from optimum.gptq import GPTQQuantizer,...
## Why are these changes needed? Adding support for LLMs by Writer (for now, our latest Palmyra-X-004 model). We'd like for Palmyra-X-004 to be available on the Chatbot Arena. You...
## Why are these changes needed? Intel Extension for PyTorch version 2.3.110+xpu & 2.4.0+cpu implement LLM specific optimisations, this PR makes use of those in FastChat. This seems to make...
## Why are these changes needed? ## Related issue number (if applicable) ## Checks - [ ] I've run `format.sh` to lint the changes in this PR. - [ ]...
## Why are these changes needed? ## Related issue number (if applicable) ## Checks - [ ] I've run `format.sh` to lint the changes in this PR. - [ ]...