CLIP
CLIP copied to clipboard
why tokenized_prompts.argmax =49407,'<|endoftext|>'
Can <|endoftext|> represent global information of tokenized_prompts?why tokenized_prompts.argmax(dim=-1) '<|endoftext|>': 49407 like cls_token of transformer? Thanks
Yes. argmax
selects the largest value in the input which is the EOT token. Because of the autoregressive mask, SOT (or CLS for the same purpose) at the beginning position will not be able to aggregate the global information, so the network is instead trained to produce the text features at the position of EOT.