IHikari

Results 2 comments of IHikari

In the original CLIP architecture, the final features are obtained through the projection of the EOS and CLS tokens. If the read-only mask can prevent the EOS or CLS tokens...

Got it! Thank you for the author's patient reply and your work.