rtfgithub
Results
2
issues of
rtfgithub
在代码中,第一阶段的训练中image encoder是冻结的,可学习的text tokens和和text encoder是可学习的。这和论文里描述的只有text tokens是可学习的,image encoder和text encoder是冻结的不匹配呀。
Why apply triplet loss to img_feature_last? here, img_feature_last is the output of the second-to-last module of the ViT model.