h2o-llmstudio
h2o-llmstudio copied to clipboard
[CODE IMPROVEMENT] Improve functioanlity for separator and stop tokens
🔧 Proposed code refactoring
Add the separator tokens as special tokens. Potentially then also add a separate setting to use the separator tokens as stop tokens. We should at least make the selection of stop tokens easier, maybe with a string list we are splitting.
This also means we need to dump the tokenizer.
So depends also on https://github.com/h2oai/h2o-llmstudio/issues/5
Motivation
Some separator tokens might currently be encoded as multiple tokens.
Also, it can be cumbersome to manually add new tokens to the list of stop tokens.