h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[CODE IMPROVEMENT] Improve functioanlity for separator and stop tokens

Open psinger opened this issue 2 years ago • 0 comments

🔧 Proposed code refactoring

Add the separator tokens as special tokens. Potentially then also add a separate setting to use the separator tokens as stop tokens. We should at least make the selection of stop tokens easier, maybe with a string list we are splitting.

This also means we need to dump the tokenizer.

So depends also on https://github.com/h2oai/h2o-llmstudio/issues/5

Motivation

Some separator tokens might currently be encoded as multiple tokens.

Also, it can be cumbersome to manually add new tokens to the list of stop tokens.

psinger avatar Apr 19 '23 16:04 psinger