CLIP-ReID
CLIP-ReID copied to clipboard
Will L2 normalization for image and text leads to better results?
When aligning image and text, why don't you need to l2 normalize the image and text features? Will this not cause the module length of the image feature to become very large in order to reduce the i2t loss in the second stage of training?