CLIP
CLIP copied to clipboard
about position embedding scale
Thanks to the good work, the position embedding initialization is multiplied by a scaling factor, which is not initialized in the original VIT. It is also mentioned in the paper that "use a slightly different initialization scheme". How should this operation be explained