YouTokenToMe icon indicating copy to clipboard operation
YouTokenToMe copied to clipboard

Vocabulary contains underscore multiple times?

Open RuABraun opened this issue 4 years ago • 0 comments

After training if I write out the vocabulary:

for w in bpe.vocab():
    fh.write(f'{w}\n')  // fh is filehandler

and then look inside the file this is (a subset) of what I see:

_
8
-
7
3
6
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_

Why is this?

RuABraun avatar Apr 20 '20 17:04 RuABraun