KoboldAI Added a backend model for API-based usage. [Included WIP support for OpenRouter & Neuro.Mancer]

Edited 3 rows in aiserver.py, the first to add the menu item, second in order to not request gpu/cpu and third for the ModelSelectionSchema class.
The openrouter_handler takes care of generations, with features such as:
- Messy code
- Some remaining TODOs
- Core story generation, including support for story & simple chat
- Some simple stoppers
- Parallelized api calls for getting batched generations
- and some comments

The code should not affect anything outside its scope, but should definitely be glanced at by someone more experienced than me.

Mar 12 '24 17:03 Iceblade02

Selecting it in the menu I get KeyError: 'OpenRouter' so something's not right. On further inspection the OpenRouter file can't load itself properly: ModuleNotFoundError: No module named 'modeling.inference_models.openrouter.class'; 'modeling.inference_models.openrouter' is not a package

For now since you made the files Openrouter specific its also better to put them all in the subfolder, we can alter this later when we add modern OpenAI support based off this.

Mar 12 '24 17:03 henk717

Oh, hang on, I managed to get in a previous WIP file, there should only be the "openrouter_handler" and "openrouter/class.py", "openrouter.py" needs to be removed

Mar 12 '24 17:03 Iceblade02

If you update your branch this PR will update automatically.

Mar 12 '24 17:03 henk717

There, should be updated now.

Mar 12 '24 17:03 Iceblade02

https://github.com/scott-ca/KoboldAI-united is the incorrect header, that is a very outdated fork not associated with us.

Mar 12 '24 17:03 henk717

When trying to submit my key it automatically removes the key and I end up with a blank (Same key works in Lite)

Mar 12 '24 17:03 henk717

Alright, I've done somewhat more thorough testing this time around, and it seems to be behaving properly now, at least on my end.

Mar 12 '24 20:03 Iceblade02

I've reworked the handler class in /modeling/inference_models to be more generic, and as a proof-of-concept added support for another API! Adding support for Neuro.Mancer took roughly an hour.

Mar 13 '24 18:03 Iceblade02

Moved GooseAI over to the new API handler.

Mar 17 '24 11:03 Iceblade02

GooseAI may be using the old implementation of OpenAI's API. Will have to be tested for. The OpenAI API endpoint should be migrated though with the ability for custom URL's, this will allow all these newer hosting company's to work.

Mar 17 '24 14:03 henk717

Currently, api_handler translates all calls to core_generate, raw_generate & _raw_generate into a call to the abstract function _raw_api_generate (unique to each specific endpoint handler), decoding the prompt if it has been tokenized, then passing it as plaintext (along with other params unchanged). As long as _raw_api_generate returns a list of strings [{"result A"}{"result B"} etc.] it will work fine, tokenizing the outputs, standardizing the lengths and returning a GenerationResult.

api_handler also provides some default functions to make api requests: (batch_api_call -> [json, json, etc], api_call -> json) taking in a call var which should contain the url, json and headers (specified in the _raw_api_generate for each endpoint handler).

Things TODO in api_handler:

[ ] Modify it to keep track of messages as individual entries. Currently it doesn't, instead treating the entire input as a single unit. (Not sure how KoboldAI handles this, which means I kept it as simple as possible.)
[ ] Work in support for streaming. (I'm neither sure how this works from client side nor on the API side, so didn't touch it.)
[ ] Improve handling of stoppers. The current implementation is very rudimentary and naive, just adding some hardcoded strings into a list depending on the requested stopper hooks.
[ ] Several settings should be moved to endpoint-specific handlers or alternatively be included in the main settings menu. (The former is easy, the latter idk how to do)
[ ] Move some response error handling from the default api_caller in api_handler to the specific endpoint handlers.
[ ] Error handling should probably be entirely reworked in api_handler. (currently it's essentially a copypasta from the old openAI/gooseAI handler)
[ ] Make the default api functions more general and robust?

Mar 17 '24 15:03 Iceblade02

NOTE: I keep triggering the following error when tabbing into the web client, and sometimes when switching between tabs in the left menu.

I don't know if it is a bug I've introduced, or if it comes from elsewhere. It feels like the latter, given that it triggers even without a model chosen, but idk.

EDIT: This seems to have been resolved

Mar 17 '24 15:03 Iceblade02

GooseAI apparently revoked the testing credit, so their backend I will no longer be able to test and have to assume works. Without testing credit it will move towards a best effort basis (Keep in mind their site is not compatible with the modern OpenAI standard to my knowledge).

I tried openrouter and this worked, but the presency penalty / frequency pentalty is shown on the loading screen and thats not confortmant with our standard. We should map the frequency penalty to the repetition penalty setting inside KoboldAI so that it can be modified without having to reload the model.

Apr 03 '24 18:04 henk717

Repetition penalty is now properly tied to the setting inside the KAI front end.

Removing presence penalty & frequency penalty would require adding them as new items in gensettings.py (and maybe elsewhere?).

Do we want to add them there?

One potential downside is we'll have a whole bunch of settings with different names that aren't actually used in a lot of models. Ideally, we'd "ask" the backend which settings it actually uses/are implemented, and only present those to the user.

Apr 16 '24 11:04 Iceblade02

Added them and it seems to work. Easy to undo if we want,

Apr 16 '24 13:04 Iceblade02