YouTokenToMe
YouTokenToMe copied to clipboard
Vocabulary contains underscore multiple times?
After training if I write out the vocabulary:
for w in bpe.vocab():
fh.write(f'{w}\n') // fh is filehandler
and then look inside the file this is (a subset) of what I see:
_
8
-
7
3
6
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Why is this?