recommenders [Question] two-tower-model + infoNCE how to optimize

[Question] two-tower-model + infoNCE how to optimize

Open unshaven opened this issue 1 year ago • 1 comments

I have tried a two-tower model (user and query) in a real industrial scenario using contrastive learning. The samples are all actual click samples, and the loss function is InfoNCE. I have a few questions:

The model performs best with only one layer, and the more MLP layers I add, the worse the HR@100 becomes.
Using L2 normalization at the end of the model degrades performance.

As a result, I currently only have one MLP layer and no normalization. Could you please provide some advice or share some experiences on what I should do?

Jun 05 '24 03:06 unshaven

recommenders recommenders copied to clipboard

[Question] two-tower-model + infoNCE how to optimize

recommenders
recommenders copied to clipboard