structure_knowledge_distillation icon indicating copy to clipboard operation
structure_knowledge_distillation copied to clipboard

How to apply pair-wise distillation on depth prediction task

Open mzy97 opened this issue 4 years ago • 1 comments

Thank you for sharing this great work! Q1: I wonder where to use pair-wise distillation loss, apply it at the end of the encoder (for example, 1/16*HW feature map of ResNet) or apply it at every scale of the encoder ( 1/16, 1/8, 1/4...)? Q2: Can pair-wise distillation work when Teacher's encoder and Student's encoder has different downsample rate, (eg. student downsample input 1/8, while teacher downsamples input 1/16), or decoder structure? Q3: Can this method used to distill from VNL to structure like FastDepth (different with VNL-student in the decoder), because VNL-student may have heavy decoder.

mzy97 avatar Feb 23 '21 13:02 mzy97

I‘m also confused about the distillation loss for the other two tasks, but especially about the pixel-wise loss. The pixel-wise loss in the paper is for the segmentation task and is KL divergence, which is obviously not suitable for the depth task. I really wonder how the pixel-wise loss is implemented, though the author explains this doesn't work for the depth task.

djmth avatar Mar 11 '21 13:03 djmth