MDCS
MDCS copied to clipboard
about diversity softmax
I would like to ask when λ < 0, why does the expert model focus on the head categories? Shouldn't ”λ < 0“ lead to a decrease in the predicted probability of the head categories?