[PlayGround] Long generating cannot be done
Describe the bug In playground, when the generating is long, it cannot finish generating in one time, in this case I typied "continue" to let it keep going. But it didn't work, it repeated from the beginning or returned empty.
Information about your version tabby 0.7.0
Information about your GPU
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:02:00.0 Off | N/A |
| 30% 26C P8 21W / 350W | 22253MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:03:00.0 Off | N/A |
| 30% 26C P8 28W / 350W | 13576MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
Additional context Add any other context about the problem here.
For now, there is a hard limitation of 2048 input tokens and a maximum of 1920 output tokens. We might consider increasing these numbers in the future.
For now, there is a hard limitation of 2048 input tokens and a maximum of 1920 output tokens. We might consider increasing these numbers in the future.
If the output > 1920 tokens (or lager maximum amount in the future), is there any way to let it continue to output all?
For now, there is a hard limitation of 2048 input tokens and a maximum of 1920 output tokens. We might consider increasing these numbers in the future.
Is there are reason as to why context is static for Tabby? @wsxiaoys
Hey - could you elaborate a bit? static context length is kind of a intrinsic thing to transformer based LLMs.
Hey - could you elaborate a bit? static context length is kind of a intrinsic thing to transformer based LLMs.
Sorry. Dependent on the models, the context can be increased to a certain size. You state that there is a hard limitation of input and output tokens, is that hard coded in Tabby? Or variable due to the model being used?
Sorry. Dependent on the models, the context can be increased to a certain size.
Ah - I got you point. It does make sense to read this value from either the registry or from gguf files directly. Filing https://github.com/TabbyML/tabby/issues/1402 to track