FastChat FastChat-T5 4K context

lmsys.org states that FastChat-T5 supports a context size of 4K. How do I get it to work? I get an error as soon as I go above 2K.

Jun 15 '23 21:06 tutankhamen-1

It can encode 2K tokens, and output 2K tokens, a total of 4K tokens. But it cannot take in 4K tokens along. @tutankhamen-1. In contrast, Llama-like model encode+output 2K tokens.

Jun 15 '23 23:06 DachengLi1

It can encode 2K tokens, and output 2K tokens, a total of 4K tokens. But it cannot take in 4K tokens along. @tutankhamen-1. In contrast, Llama-like model encode+output 2K tokens.

That’s great, but the 2K total limit seems to be hardcoded in many places and I can’t get it to work. I’m trying to use it through the API.

Jun 16 '23 06:06 tutankhamen-1

The current behavior should be correct? It can only encode 2K tokens, which is the hardcoded places you see. But it can output another 2K tokens. If you use Llama (vicuna), it can encode 2K tokens, but when you give 2K tokens to it, it cannot output anything.

Jun 16 '23 15:06 DachengLi1

This is the error message I get:

This model's maximum context length is 2048 tokens. However, you requested 2302 tokens (1790 in the messages, 512 in the completion). Please reduce the length of the messages or completion.

Model: fastchat-t5-3b-v1.0

Jun 16 '23 15:06 tutankhamen-1

@tutankhamen-1 Thanks for letting us know! We will fix it. @merrymercy Let's change the error message for T5?

Jun 16 '23 15:06 DachengLi1

Isn't this limit arbitrary for t5 due to its attention mechanism? My understanding is that it uses quadratic memory as the context length goes up, but as long as you have the ram to support it, a longer context length isn't limited by the model itself.

Jun 30 '23 18:06 Taytay

@tutankhamen-1 Could you help us fix the bug and contribute a pull request?

Jul 05 '23 09:07 merrymercy

FastChat FastChat copied to clipboard

FastChat-T5 4K context

FastChat
FastChat copied to clipboard