yoyodyne Subword tokenization

Subword tokenization

Open bonham79 opened this issue 1 year ago • 4 comments

What are people's thoughts on adding preprocessing scripts to allow BPE-like tokenization of characters? Technically we already support this (just tokenize your input and use delineation function). But wonder if we see it as worthwhile as also writing up the scripting so it can be managed by the repo as well?

Feb 08 '24 17:02 bonham79

yoyodyne yoyodyne copied to clipboard

Subword tokenization

yoyodyne
yoyodyne copied to clipboard