VldmrB
VldmrB
Have you done this? https://github.com/oobabooga/text-generation-webui/issues/173#issuecomment-1456087035
There's a link to another comment in that post https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1456040134
Done, please take a look. Sorry about the force pushes, barely used git/GitHub in a long time. Let me know if I should drop this one and create another with...
The bot talks to itself a lot e.g. ``` This is a conversation between two people. Person 1: Hello! Person 2: Hi, how are you? Person 2: I'm fine thank...
There's still a difference between `continue from the same line` and `omit blank Person 1 prompts`. But I tried what you suggested, since this functionality could be moved out into...
All good, happy to see that it made it in. :)
4-bit 30b is slower than 8-bit 13b for me as well. I'm not sure if I ever tested it outside of chat mode. In chat mode, there's a long delay...
I tried it briefly in non-chat mode, and for me, there's a similar delay there as well. However, since in non-chat mode, it generates tokens until it hits the limit...
Thanks, @aljungberg I tried it, and it is a lot quicker! There's still more of an initial delay than when using 8-bit 13B, but it's actually faster overall. Though, I...
Tried it some more, it is faster than before, though still slower than 13b-8bit, with larger contexts. I tried the `faster_kernel` thing too, didn't discern a meaningful difference there. I...