FastChat issues

T5 model - what patern?

I love the T5 model. https://github.com/lm-sys/FastChat/blob/a26db3c814889035d92c8ae80d6defbd7381ee55/fastchat/train/train_flant5.py#LL170C12-L170C12 It seems to use `### USER:` but I thought moved over to using `` to separate?

lizelive

Streaming chat

This pull request implements the streaming chat API per the documentation seen here in this notebook: https://github.com/openai/openai-cookbook/blob/b92d7e7b9204ecf914a91a2781dd967aa7c52be1/examples/How_to_stream_completions.ipynb Here is example code to test with: ```python import requests import json url...

kfatehi

What's the minus 8 in generate_stream function for max_src_len??

Hello! I would like to ask about the meaning of this line: https://github.com/lm-sys/FastChat/blob/a26db3c814889035d92c8ae80d6defbd7381ee55/fastchat/serve/inference.py#L189 `max_new_tokens` is for the space for the new generation but what's the `8` for? Thanks in advance...

gabinguo

The stop parameter in openai API doesn't work since v0.2.5

1

Since version v0.2.5, it seems the stop parameter in openai api is directly set `conv.stop_str`, rather than from request. https://github.com/lm-sys/FastChat/blob/v0.2.5/fastchat/serve/api.py#L134 In version v0.2.3, it works when set in the request....

llama-assistant

Modifying the answer format (### Human: xxx , ### Assistant: xxx)

I've been using Vicuna for Question-Answering. I'm using the [py-bindings](https://github.com/abetlen/llama-cpp-python) (llama-cpp-python) and [LangChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/llamacpp.html). My prompt template is: ``` template = """Use the following pieces of context to answer the question...

rlancemartin

Refactor to add MPT

https://huggingface.co/mosaicml/mpt-1b-redpajama-200b

hlzhang109

Byte deltas

12

Instead of using parameter deltas this implementation compares each byte in the delta and in the LLaMA model and outputs the vicuna model. This offers significntly less RAM usage compared...

RedmiS22018

Extraneous newlines in lmsys/fastchat-t5-3b-v1.0 tokenizer

2

Vicuna tokenizer has no extra '\n' characters. T5 tokenizer inserts them after each space. Reproduce: ```python from transformers import (T5TokenizerFast, T5ForConditionalGeneration, AutoTokenizer,LlamaTokenizer) t = T5TokenizerFast.from_pretrained('lmsys/fastchat-t5-3b-v1.0') text = 'I am a...

bradfox2

question

Is there a way to optimize the output token per second?

1

Hi there, I understand autoregression which outputs words one by one. With some manual benchmark, our deployment gives 50 English words in 6 seconds. Is there a way to optimize...

vinvcn

Alternative Implementation of 8bit Quantization

Hi all, thanks a lot for the nice work introducing Vicuna and FastChat. I am a beginner in NLP (so correct me if I am wrong) and use GPUs with...

Sissel-Wu

FastChat
FastChat copied to clipboard

Metadata

T5 model - what patern?

Streaming chat

What's the minus 8 in generate_stream function for max_src_len??

The stop parameter in openai API doesn't work since v0.2.5

Modifying the answer format (### Human: xxx , ### Assistant: xxx)

Refactor to add MPT

Byte deltas

Extraneous newlines in lmsys/fastchat-t5-3b-v1.0 tokenizer

Is there a way to optimize the output token per second?

Alternative Implementation of 8bit Quantization

← Metadata

Owner

Metadata

FastChat FastChat copied to clipboard

Metadata

← Metadata

Owner

Metadata

FastChat
FastChat copied to clipboard