text icon indicating copy to clipboard operation
text copied to clipboard

`CLIPTokenizer` always ignores the first line in `bpe_merges`

Open ProGamerGov opened this issue 3 years ago • 0 comments

🐛 Bug

It seems to me that the first line is ignored due to it being a comment about the file in open_clip and OpenAI's vocab file, but this may not always be the case with inputs.

https://github.com/pytorch/text/blob/main/torchtext/transforms.py#L351

This issue will probably be resolved once: https://github.com/pytorch/text/issues/1624 is done, but in the meantime it's a potentially problematic behavior that should at least be documented.

ProGamerGov avatar Feb 27 '22 19:02 ProGamerGov