text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Show the total number of tokens and generation speed in chat UI (#2243)

Open kha84 opened this issue 5 months ago • 9 comments

Checklist:

What's been done:

Two small changes:

  1. one to modules/text_generation.py to expose the number of tokens that was identified as "context" and "generated output" as per last call of generate_reply_HF / generate_reply_custom
  2. another is to modules/ui_chat.py to show that number of tokens similar way how it's being shown for "Default" tab

kha84 avatar Jan 23 '24 15:01 kha84

image

kha84 avatar Jan 23 '24 22:01 kha84

Added token per second to be displayed, still WIP Screenshot from 2024-01-29 00-01-49

kha84 avatar Jan 28 '24 21:01 kha84

Need to polish Gradio part - don't like the place, where it is being displayed currently.

kha84 avatar Jan 28 '24 21:01 kha84

Why tho? It's already being displayed on the cmd window. Adding it to the UI would be unnecessary unless you using it through the API in which case it would helpful. But other than that, i don't see why people would want to know the speed and tokens if they are just interested in the output text itself. image

However if you are determine to add it, i think the best placement would be somewhere here. The current placement is in the way of the user's vision and it might impede on the text they want to read. image

YakuzaSuske avatar Feb 09 '24 19:02 YakuzaSuske

Yeah, I was thinking about that placement as well, thanks!

Answering your "why" question:

  1. having this crucial info hidden somewhere in logs doesn't seem to be very handy. In order to see these figures you need to jump back and forth between web ui and cli:
  • For instance, ones might want to benchmark different quants or formats for the same model to see how fast are they compared one to another. With having this setting exposed to UI you'll be able to do everything there.
  • I also see lot of people are publicly sharing their "tps" figures just out of wild guess, without even knowing where to look at them. This should help them as well.
  1. the amount of context used is quite important runtime parameter, even for chat-only casual users. For now you have to blind-guess if your chats are already out of the context window or not yet. Or to jump to CLI to see logs. Having it exposed to be displayed all the time gives you much more details about whether or not you should be still expecting coherent answers from your model.

kha84 avatar Feb 09 '24 20:02 kha84

I think for me in my case it just seems kind of odd but.... tbh i always have the command window open and to the side. So i guess that's why i asked the question. I thought people always did that too but 🤷‍♂️ I guess i'm the only one who does that, so that's why i found it particularly odd since i can always at all times see the log, tokens a second, context, etc. by just glancing to my left real quick.

As seen below: image

YakuzaSuske avatar Feb 10 '24 07:02 YakuzaSuske

Why not put it under or next to the {{char}} icon like SillyTavern does?

number of tokens and duration.

I think this is a good addition to the UI as I don't have the command window showing.

biship avatar Feb 12 '24 13:02 biship

@YakuzaSuske not everyone interacts on the same computer it's running on, I also use it on my phone, sometimes I'm curious and I'd rather not open terminal, ssh, find the program, watch the logs, and then swap between programs

I think this is a great inclusion and would love to see it implemented

bartowski1182 avatar Feb 15 '24 17:02 bartowski1182

@YakuzaSuske not everyone interacts on the same computer it's running on, I also use it on my phone, sometimes I'm curious and I'd rather not open terminal, ssh, find the program, watch the logs, and then swap between programs

I think this is a great inclusion and would love to see it implemented

This is the correct answer for "why". There are many instances (remote access, runpod) where you don't always get to see terminal.

FartyPants avatar Apr 21 '24 14:04 FartyPants