mprm
mprm copied to clipboard
Fluctuated loss with channel attention head
Dear plusmultiply,
Thank you very much for sharing the code. During the training, PCAM, SA and PSA work well and the loss steadily decreases. However, when I use channel attention head, the loss always fluctuates, resulting in low accuracy. I try it separately or combine it with other heads. Both get unstable losses. Have you met the same problem or could you give any possible explanation for this?
I'm looking forward to your reply.