yoyodyne icon indicating copy to clipboard operation
yoyodyne copied to clipboard

Subword tokenization

Open bonham79 opened this issue 1 year ago • 4 comments

What are people's thoughts on adding preprocessing scripts to allow BPE-like tokenization of characters? Technically we already support this (just tokenize your input and use delineation function). But wonder if we see it as worthwhile as also writing up the scripting so it can be managed by the repo as well?

bonham79 avatar Feb 08 '24 17:02 bonham79