scenic
scenic copied to clipboard
Nit: Specify units
Wasn't sure whether 77 meant words or characters. From the source, it looks like it's chars.
CLIP uses the byte pair tokeniser implemented here: https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py
The resulting tokens will not be characters.