DST-CBC Question about the paper

Question about the paper

Open Hugo-cell111 opened this issue 2 years ago • 1 comments

Hi! I just read your paper of DMT and quite appreciate your work. But I can't fully understande the statement in the paper:"It can be interpreted that a relatively larger γ1 represents a more emphasized entropy minimization, a larger γ2 represents a more emphasized mutual learning. Largeγ values are often better for high-noise scenarios, or to maintain larger intermodel disagreement." Could you please explain it? Thanks a lot!

Feb 13 '23 10:02 Hugo-cell111

@Hugo-cell111 FYI, larger γ corresponds to larger differences in loss weighting. Since loss weighting is the core of the dynamic loss, hereby the use of the expression "emphasize".

γ1 is used when models predict the same label, which corresponds to entropy minimization.
γ2 is used when models predict different labels, which corresponds to mutual learning.

As for the last statement on high-noise and disagreement, it is more empirical. You can understand it as the effects of a overall low learning rate (although not exactly so considering the exponential dynamic weight), the models won't make large steps towards noisy labels or each other.

Feb 13 '23 13:02 voldemortX

DST-CBC DST-CBC copied to clipboard

Question about the paper

DST-CBC
DST-CBC copied to clipboard