FastChat
FastChat copied to clipboard
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
I love the T5 model. https://github.com/lm-sys/FastChat/blob/a26db3c814889035d92c8ae80d6defbd7381ee55/fastchat/train/train_flant5.py#LL170C12-L170C12 It seems to use `### USER:` but I thought moved over to using `` to separate?
This pull request implements the streaming chat API per the documentation seen here in this notebook: https://github.com/openai/openai-cookbook/blob/b92d7e7b9204ecf914a91a2781dd967aa7c52be1/examples/How_to_stream_completions.ipynb Here is example code to test with: ```python import requests import json url...
Hello! I would like to ask about the meaning of this line: https://github.com/lm-sys/FastChat/blob/a26db3c814889035d92c8ae80d6defbd7381ee55/fastchat/serve/inference.py#L189 `max_new_tokens` is for the space for the new generation but what's the `8` for? Thanks in advance...
Since version v0.2.5, it seems the stop parameter in openai api is directly set `conv.stop_str`, rather than from request. https://github.com/lm-sys/FastChat/blob/v0.2.5/fastchat/serve/api.py#L134 In version v0.2.3, it works when set in the request....
I've been using Vicuna for Question-Answering. I'm using the [py-bindings](https://github.com/abetlen/llama-cpp-python) (llama-cpp-python) and [LangChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/llamacpp.html). My prompt template is: ``` template = """Use the following pieces of context to answer the question...
https://huggingface.co/mosaicml/mpt-1b-redpajama-200b
Byte deltas
Instead of using parameter deltas this implementation compares each byte in the delta and in the LLaMA model and outputs the vicuna model. This offers significntly less RAM usage compared...
Vicuna tokenizer has no extra '\n' characters. T5 tokenizer inserts them after each space. Reproduce: ```python from transformers import (T5TokenizerFast, T5ForConditionalGeneration, AutoTokenizer,LlamaTokenizer) t = T5TokenizerFast.from_pretrained('lmsys/fastchat-t5-3b-v1.0') text = 'I am a...
Hi there, I understand autoregression which outputs words one by one. With some manual benchmark, our deployment gives 50 English words in 6 seconds. Is there a way to optimize...
Hi all, thanks a lot for the nice work introducing Vicuna and FastChat. I am a beginner in NLP (so correct me if I am wrong) and use GPUs with...