recommenders
recommenders copied to clipboard
[Question] two-tower-model + infoNCE how to optimize
I have tried a two-tower model (user and query) in a real industrial scenario using contrastive learning. The samples are all actual click samples, and the loss function is InfoNCE. I have a few questions:
- The model performs best with only one layer, and the more MLP layers I add, the worse the HR@100 becomes.
- Using L2 normalization at the end of the model degrades performance.
As a result, I currently only have one MLP layer and no normalization. Could you please provide some advice or share some experiences on what I should do?