CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

why tokenized_prompts.argmax =49407,'<|endoftext|>'

Open Harzva opened this issue 2 years ago • 1 comments

Can <|endoftext|> represent global information of tokenized_prompts?why tokenized_prompts.argmax(dim=-1) '<|endoftext|>': 49407 like cls_token of transformer? Thanks

Harzva avatar Dec 16 '22 00:12 Harzva

Yes. argmax selects the largest value in the input which is the EOT token. Because of the autoregressive mask, SOT (or CLS for the same purpose) at the beginning position will not be able to aggregate the global information, so the network is instead trained to produce the text features at the position of EOT.

jongwook avatar Dec 17 '22 01:12 jongwook