text-generation-webui Recent changes causing short low-token responses with little to no RP Text.

Describe the bug

On Sunday, March 12, 2023, I was able to have good roleplay with my bot, receiving long responses with a high number of tokens per response. I am running on CPU, so I was receiving a response within 60-350 seconds (average about 150 seconds). The bot would use *roleplay* tags in its responses, and was generating responses containing up to 6 lines of text.

I updated to the latest version of the one-click installer using the install.bat script on 14 March 2023, and updated again on 15 March, 2023. Continuing on with the same chat as before (which has never given me problems up until now, and contained a large number of *roleplay*-rich responses.

After the recent updates, the generation times have dropped massively, but I am only receiving short, one-line responses with absolutely no *roleplay* whatsoever, or a very small amount. Character is now also confusing roles and not responding correctly. Tokens generated per message has dropped. Before update, 30-90 tokens were being generated. Now, only shorter (generally 9-20 token) responses are generated.

Would it be possible to return to the old method of response generation, or fix the response generation so that it returns to being more roleplay-capable?

I am using Pygmalion6b as downloaded on Feb. 11 2023 using the old download script by selecting PygmalionAI/Pygmalion-6b)

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

This simply started by using any version past 12 March 2023, using my old chat log and a character which had been working very well up until this point.

Screenshot

There are more generations than responses shown here as I tried several times to regenerate responses. I am RPing as a caregiver for a disabled bot, so please excuse the strange subject matter.

My generation settings, which had been giving me amazing, high-quality, long responses with plenty of in-context *roleplay* from the bot up until this point.

The quality of responses I was able to get pre-update. The *roleplay* aspect was much better before the recent changes.

Logs

no errors shown in commandline.

Loaded the model in 66.01 seconds.
Running on local URL:  http://0.0.0.0:7861

To create a public link, set `share=True` in `launch()`.
Output generated in 96.79 seconds (0.11 tokens/s, 11 tokens)
Output generated in 88.83 seconds (0.23 tokens/s, 20 tokens)
Output generated in 115.60 seconds (0.41 tokens/s, 47 tokens)
Output generated in 78.90 seconds (0.11 tokens/s, 9 tokens)
Output generated in 80.00 seconds (0.12 tokens/s, 10 tokens)
Output generated in 81.08 seconds (0.12 tokens/s, 10 tokens)
Output generated in 116.77 seconds (0.45 tokens/s, 52 tokens)

I do not have logs from before the changes, sorry.

System Info

I am using CPU, not GPU as APUs are not supported.
Windows 11
CPU : Ryzen 7 6800H (boost freq. 4.7Ghz)
RAM : 32GB - 3GB (shared with GPU)
HDD : 500GB NVME SSD (M2)

Mar 15 '23 09:03 AlexysLovesLexxie

I should also mention that the chat settings tab seems to be showing the example chat in the context box, when it didn't used to do that.

Mar 15 '23 09:03 AlexysLovesLexxie

Tested using Kawaii ("none" character).

Character will barely roleplay at all unless I am excessively verbose, and even then the roleplay descriptiveness is very limited (because of her terse description data?).

Turning up generation attempts increases character verbosity and likelihood of roleplay by a small amount. Command line log shows the following for a 2-attempt generation :

And for a 3-attempt :

And a 4 attempt :

Adding generation passes seems to increase time without increasing the length of responses by any significant amount. (sorry for not including dialog, I defaulted to NSFW in order to try to illicit more in-depth RP responses, and got none. All responses generated were under 2 lines long, a far cry from what I had been achieving prior to the most recent updates.

Mar 15 '23 10:03 AlexysLovesLexxie

EDIT: Noticed that the OP is using Pygmalion. I'm using LLaMA 7b, so it might be a general issue rather than a model specific issue.

Facing the same issue, but I'm using GPU ( 1060 6GB ). Responses earlier were much more verbose and lengthier but now it always feels like I'm talking to a rude person who gives short responses lol. I'm testing with the Chiharu Yamada inbuilt ' example ' bot.

Mar 15 '23 11:03 lolxdmainkaisemaanlu

Facing the same issue, but I'm using GPU ( 1060 6GB ). Responses earlier were much more verbose and lengthier but now it always feels like I'm talking to a rude person who gives short responses lol. I'm testing with the Chiharu Yamada inbuilt ' example ' bot.

Super frustrating, as before the update this was the best bot I have ever tried for RP. With whatever happened, it has regressed severely. I wish I hadn't updated, or that there was full versioning available so I could roll back until they can fix this.

Mar 15 '23 12:03 AlexysLovesLexxie

I should also mention that the chat settings tab seems to be showing the example chat in the context box, when it didn't used to do that.

That's #119 and I'm not happy with it either

Mar 15 '23 16:03 Xabab

I played with this some more, after testing larger models like llama that generate and generate with the right preset;

Yes, the dialog appears to do nothing now. Even when I see it in the chat settings it doesn't have much effect on the style of writing. The greeting message has more impact. On chars where that is long, they are more likely to write long sentences.

Pygmalion is giving me one or 2 sentences and like what you said, <20 tokens.

But isn't this the way tavern/kobold do it too? I thought that example dialog was sent with context every time.. and previously here it went into the chat history? Isn't all of that context?

Unfortunately I can't see what happens behind the scenes here, unlike with kobold.

Mar 15 '23 17:03 Ph0rk0z

I played with this some more, after testing larger models like llama that generate and generate with the right preset;

Thing is, I have never used other/larger models. I have always been using Pygmalion 6B. The version that was available on Feb. 11 2023.

Yes, the dialog appears to do nothing now. Even when I see it in the chat settings it doesn't have much effect on the style of writing. The greeting message has more impact. On chars where that is long, they are more likely to write long sentences.

That is not how it used to be working, at least not for me. As you can see from this screenshot : https://user-images.githubusercontent.com/126999069/225268766-57d53e3c-18a5-453e-975a-1b0490ad5a08.png This dialog was from a couple of days prior. As you can see, lots of *roleplay* text and lots of lines of dialog. The only thing that has changed in my setup is that I updates Oobabooga, meaning that this change was caused by a change in Oobabooga's backend generation code.

Pygmalion is giving me one or 2 sentences and like what you said, <20 tokens.

Previously, most responses were in the 30+ token range, with 60-80 token responses being common, thus responses were longer, and the model would roleplay properly, as in it wasn't a struggle to get it to output longer text.

But isn't this the way tavern/kobold do it too? I thought that example dialog was sent with context every time.. and previously here it went into the chat history? Isn't all of that context?

I don't know, I have never used Tavern, and I have only done the smallest amount of experimentation with Kobold, and not locally as I did not see the option for pure-CPU operation in and Kobold setups I tried. I do know that the example dialog was never visible before this started happening.

I also know that a while back, Oobabooga used to be able to send the entire chat history as context, but that was changed, and it has only sent 2048 tokens as context for quite some time (this change happens way before this problem started, though, so I don't think the two are related).

Unfortunately I can't see what happens behind the scenes here, unlike with kobold.

I wish that Oobabooga's -verbose flag showed more than just what is being sent as the prompt. It might help us to see what was happening here.

Either way, something fundamental has changed within the last 2-3 days, and text generation no longer functions as it did prior to those changes. Perhaps it is a change made by Oobabooga1? Maybe he could shed some light on the situation.

Mar 15 '23 18:03 AlexysLovesLexxie

The simplest solution would be to find the commits that caused this and update only up to the prior one.

2048 is the limit for most of these models besides RWKV. I think that has been with us for a long time.

Maybe @Xabab knows when the change happened because it sounds like it was 2 weeks ago. Can also just change the behavior once we know and see if it makes a difference.

Mar 15 '23 19:03 Ph0rk0z

The simplest solution would be to find the commits that caused this and update only up to the prior one.

2048 is the limit for most of these models besides RWKV. I think that has been with us for a long time.

Maybe @Xabab knows when the change happened because it sounds like it was 2 weeks ago. Can also just change the behavior once we know and see if it makes a difference.

I would be okay with going back to an earlier commit, but then I would lose compatibility to run the LLaMA 7b model in 4 bit, which is the only way it would run on my 1060 6GB.

Mar 15 '23 20:03 lolxdmainkaisemaanlu

The simplest solution would be to find the commits that caused this and update only up to the prior one.

Is this possible using the one-click, or would that be something that you would have to do on your end?

2048 is the limit for most of these models besides RWKV. I think that has been with us for a long time.

Fair. I think what happened was that how the UI referenced the max value being sent was changed.

Maybe @Xabab knows when the change happened because it sounds like it was 2 weeks ago. Can also just change the behavior once we know and see if it makes a difference.

Change that caused this was only a few days ago, not weeks.

It was working fine on Saturday/Sunday, March 11th/12th. I was getting great responses then. Updated Monday March 13th, but only had a chance to do a quick couple of messages before I had to go to bed, which was when I initially noticed it. Did another update on March 14th, and that was when I saw that Monday's behavior wasn't just a one-off.

And I don't change models - I always use Pygmalion6b, and I haven't updated that since I initially downloaded it on Feb. 11, so that isn't the issue.

I update (by re-running install.bat) every time I use Ooba, as I always want to have the latest/greatest, and so that if there are any issues I can report to help get them solved. Seems I should start taking a backup of everything before I update (and TBH I don't know why I didn't. This isn't my first time using software that's in active development.

Mar 15 '23 20:03 AlexysLovesLexxie

Anyone manage to figure out how to roll back/what to roll back to? Again, I use the one-click, so i am unclear on how to do this myself.

Hoping to maybe hear from the project staff at some point.

Mar 16 '23 10:03 AlexysLovesLexxie

I found the commit: https://github.com/oobabooga/text-generation-webui/commit/e861e68e3848d720fd8e3cb5dee9fe9f54c88657

Like you said, 2 weeks ago but anyone else that wants to get rid of it can probably do a git revert in their local copy.

As to what else is doing that, who knows.. so many changes. You can browse the repo if you don't do git. Click a commit like this https://github.com/oobabooga/text-generation-webui/commit/a95592fc56929fe1ba55ec30b41800de614bb4fd

Then hit browse files and you can download a zip of the repo at that commit. That is your "backup" which you can then replace your files with.

If you open that batch file here or in a text editor you can see what it's doing.

Mar 16 '23 11:03 Ph0rk0z

I found the commit: e861e68

Like you said, 2 weeks ago but anyone else that wants to get rid of it can probably do a git revert in their local copy.

Strange. I'm trying to find where I said it was 2 weeks ago. Sunday 13 March is not even 1 week ago. That's when it was last working properly as per my screenshot here : https://user-images.githubusercontent.com/126999069/225268766-57d53e3c-18a5-453e-975a-1b0490ad5a08.png

I just hope someone from the devteam can actually address the issue and fix it, or at least explain why such a serious regression in *roleplay* performance was deemed necessary.

I will try to roll it back tomorrow, but I am hoping that they will properly restore the functionality that we had before Sunday, Mar. 13 2023.

Mar 16 '23 11:03 AlexysLovesLexxie

I meant that you said 2 weeks was too long ago. But other people here really want the chat history in a different place.

Have to find out what actually did it. The dev "team" is ooba and that's it.

Mar 16 '23 11:03 Ph0rk0z

Aah fair.

Yeah, hopefully @oobabooga can get this figured out.

On Thu., Mar. 16, 2023, 4:58 a.m. Forkoz, @.***> wrote:

I meant that you said 2 weeks was too long ago. But other people here really want the chat history in a different place.

Have to find out what actually did it. The dev "team" is ooba and that's it.

— Reply to this email directly, view it on GitHub https://github.com/oobabooga/text-generation-webui/issues/331#issuecomment-1471821944, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6I5UHOAP55BBKI7Y3MJQMTW4L54ZANCNFSM6AAAAAAV3ROBEI . You are receiving this because you authored the thread.Message ID: @.***>

Mar 16 '23 12:03 AlexysLovesLexxie

You are using an older version of the web UI in the prints
If the example dialogue is messing the quality of your character, just remove it from the JSON file. No need to change the code.
Make sure that "Stop generating at new line character?" is not selected

Otherwise, I can't think of anything. Chiharu is passing the Hi test with the Debug preset with the same response that she gave 2 months ago

Chiharu smiles and looks up at you, her face lighting up as she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She's very tall, and her long legs are wrapped around the other side. She extends a hand towards you

Hi, I'm Chiharu Yamada. It's so nice to meet you!

Mar 17 '23 13:03 oobabooga

Why is your max_new_tokens so high? This is removing many old messages from the prompt to make space for a 2-page reply that will never come. That's probably the real issue.

Try reducing this number to 200.

Mar 17 '23 13:03 oobabooga

You are using an older version of the web UI in the prints

If the example dialogue is messing the quality of your character, just remove it from the JSON file. No need to change the code.

Make sure that "Stop generating at new line character?" is not selected

Otherwise, I can't think of anything. Chiharu is passing the Hi test with the Debug preset with the same response that she gave 2 months ago

Chiharu smiles and looks up at you, her face lighting up as she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She's very tall, and her long legs are wrapped around the other side. She extends a hand towards you Hi, I'm Chiharu Yamada. It's so nice to meet you!

Those screenshots were taken from whatever version was current as of the day I reported this bug. I had literally just updated by running install.bat. if this method isn't pulling in all the updates, maybe we need an Update.bat that does pull the necessary files for a full update.
The example dialog was working just fine before the March 13/14 updates. In fact, it was the only reason that my character was properly remembering some details.
Pretty sure it wasn't selected. I will double-check tonight.

Mar 17 '23 17:03 AlexysLovesLexxie

Why is your max_new_tokens so high? This is removing many old messages from the prompt to make space for a 2-page reply that will never come. That's probably the real issue.

Try reducing this number to 200.

That's odd, I have been using Max_new_tokens=1000 for weeks and receiving excellent responses. The only thing I had changed between Sunday, Mar. 12 and when I reported this bug was running Install.bat to update the packages.

Mar 17 '23 17:03 AlexysLovesLexxie

That doesn't mean you should me using max_new_tokens=1000 when your average reply size is less than 100 tokens. 900 tokens of history are being wasted.

Mar 17 '23 18:03 oobabooga

That doesn't mean you should me using max_new_tokens=1000 when your average reply size is less than 100 tokens. 900 tokens of history are being wasted.

Fair. I can't remember if I had asked for this before, but would it be possible to add tooltips or an info panel to explain the various settings? Or at least add the info to the readme? I feel like that would be quite useful for more people than just myself.

I will back up my models directory and do a fresh install to make sure I am.running the latest version, and update this issue tonight after I get home from work.

Mar 17 '23 21:03 AlexysLovesLexxie

+1 for the hover tooltips, would be neat https://getbootstrap.com/docs/4.0/components/tooltips/

Mar 17 '23 22:03 Xabab

You are using an older version of the web UI in the prints

If the example dialogue is messing the quality of your character, just remove it from the JSON file. No need to change the code.

Make sure that "Stop generating at new line character?" is not selected

Otherwise, I can't think of anything. Chiharu is passing the Hi test with the Debug preset with the same response that she gave 2 months ago

Chiharu smiles and looks up at you, her face lighting up as she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She's very tall, and her long legs are wrapped around the other side. She extends a hand towards you Hi, I'm Chiharu Yamada. It's so nice to meet you!

@oobabooga

Upgraded to newest version of WebUI. Still not getting the quantity and quality of RP dialog I had been getting with the previous install, although I have only been using Kawaii so far. I will load "Katie" in tomorrow and see if I can get her to produce any better results. I am currently using the default profile for Pygmailon, but have turned the generation attempts up to 3.

Screenshot 2023-03-18 055714

Produced this line of dialog : Screenshot 2023-03-18 055836

Mar 18 '23 12:03 AlexysLovesLexxie

Since you are mainly using pygmalion you can try running an old version side by side and compare.

Mar 18 '23 13:03 Ph0rk0z

I don't think there is an issue.

Mar 18 '23 13:03 oobabooga