text
text copied to clipboard
`CLIPTokenizer` always ignores the first line in `bpe_merges`
🐛 Bug
It seems to me that the first line is ignored due to it being a comment about the file in open_clip and OpenAI's vocab file, but this may not always be the case with inputs.
https://github.com/pytorch/text/blob/main/torchtext/transforms.py#L351
This issue will probably be resolved once: https://github.com/pytorch/text/issues/1624 is done, but in the meantime it's a potentially problematic behavior that should at least be documented.