Why not compute consistency on the raw features or predictions directly

Open MLDeS opened this issue 1 year ago • 0 comments

Hi All,

Thanks for the nice work.

I have a question regarding the depiction in Figure 1. Why do compute the consistency loss after sharpening the predictions? Why not minimize a form of KL divergence from the model features or raw predictions. Did you observe that the sharpened form lead to better training? Or what was the rationale?

Thanks!

Apr 17 '24 11:04 MLDeS