CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

Context length understanding

Open Tortoise17 opened this issue 4 years ago • 1 comments

I have to ask important question. There is context length 77. Does it mean that the query search limit is 77 alphabets or 77 words or anything which I misunderstood? Please if you can guide it clearly.

Because as it has been trained on captions. I want to know its optimal use boundaries how the text is taken into consideration and what maximum size is the ideal for searching the embeddings.

Tortoise17 avatar Nov 21 '21 09:11 Tortoise17

Just watched a explainer video from youtuber The AI Epiphany

near the beginning they do a 'hello world!' Experimenting, I found the words each count for 1 token as does the punctuation. The start and end of the string also count for 1 token each.

I recommend to open the colab and try out different strings to get a sense how the tokens are counted.

aztecman avatar Jun 23 '22 18:06 aztecman