knowledge-distillation-keras correction for soft targets loss may be needed

correction for soft targets loss may be needed

Open arsenyinfo opened this issue 6 years ago • 1 comments

Quoting original paper (block 2):

Since the magnitudes of the gradients produced by the soft targets scale as 1/T2 it is important to multiply them by T2 when using both hard and soft targets.

It looks like this correction is not included in your knowledge_distillation_loss.

Jun 11 '18 12:06 arsenyinfo

Quoting original paper (block 2):

Since the magnitudes of the gradients produced by the soft targets scale as 1/T2 it is important to multiply them by T2 when using both hard and soft targets.

It looks like this correction is not included in your knowledge_distillation_loss.

@arsenyinfo Do you know the place where correction is needed?

May 22 '21 10:05 sub1120

knowledge-distillation-keras knowledge-distillation-keras copied to clipboard

correction for soft targets loss may be needed

knowledge-distillation-keras
knowledge-distillation-keras copied to clipboard