exllama
exllama copied to clipboard
Request: Some improvements to web app.py
As in the title. The web app is very nice and simple and clean. It works without any fuss and doesn't have any of the vram or otherwise overhead of other existing inferencing frontends. It works as is and does not need further changes.
However, some nice to haves:
- Ability to continue the last reply instead of create a new bot response when using empty send
- A cleaner raw text completion mode, for using in more than chat/instruct mode
- Lora loading
- Custom stop tokens
- Ability to save/load sampler presets/sessions
- Better handling of \n in user/bot names (For using it as instruct rather than chat)