RepDistiller
RepDistiller copied to clipboard
[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
To reproduce the baseline result on my machine (kd from rn34 to rn18), I would like to know the hyperparameter settings for knowledge distillation on Imagenet. Especially the weights for...
I think it should be `softmax` instead. Otherwise `p_t` and `p_s` are not comparable. Could you please explain why? https://github.com/HobbitLong/RepDistiller/blob/dcc043277f2820efafd679ffb82b8e8195b7e222/distiller_zoo/KD.py#L13-L17
What an excellent work! I have learned so much from this paper. But I have a question about normalization constant Z_v1 and Z_v2 in the ContrastMemory. The Z_v1 and Z_v2...
@HobbitLong Thank you very much for making the effort to clean and post your code for these benchmarks! I'm sure that you don't have time to post code for the...
Hi, thanks for your great works. I wonder whether the CRD distillation loss could be used in image enhancement tasks like Denoise\SR\Deblur. I would appreciate it if someone could answer...
Hello Sir, I would like to start by saying how great this work is! And I would like to know if the resnet models can be used directly on cifar10...
resnet use 7x7conv and maxpool In the beginning,but this rep uses 3x3 conv and no maxpool,is there any reason for doing this?
Sorry!When i want to run fecthc_pretrained_tearhers.sh, I can't download the teacher model. Is this server down?
There are several hyper parameters existing: 1. teacher model hyper parameters 2. student model hyper parameters 3. KD hyper parameters (e.g., balance weight for different losses) 4. Training hyper parameters...
In the paper for Similarity-Preserving loss. The normalization is before the operation of matrix Multiplication. Does the order matter the performance. ``` import torch org_f_s = torch.rand((64, 96)) org_f_t =...