RepDistiller issues

Hyperparameter Settings for KD on Imagenet

To reproduce the baseline result on my machine (kd from rn34 to rn18), I would like to know the hyperparameter settings for knowledge distillation on Imagenet. Especially the weights for...

Calmepro777

Why using log_softmax instead of softmax?

1

I think it should be `softmax` instead. Otherwise `p_t` and `p_s` are not comparable. Could you please explain why? https://github.com/HobbitLong/RepDistiller/blob/dcc043277f2820efafd679ffb82b8e8195b7e222/distiller_zoo/KD.py#L13-L17

nguyenvulong

Question about normalization constant Z_v1 and Z_v2 in the ContrastMemory

What an excellent work! I have learned so much from this paper. But I have a question about normalization constant Z_v1 and Z_v2 in the ContrastMemory. The Z_v1 and Z_v2...

YujieZheng99

Ensemble Task Implementation

2

@HobbitLong Thank you very much for making the effort to clean and post your code for these benchmarks! I'm sure that you don't have time to post code for the...

sdsawtelle

crd used in image enhancement task like Denoise\SR\Deblur.

Hi, thanks for your great works. I wonder whether the CRD distillation loss could be used in image enhancement tasks like Denoise\SR\Deblur. I would appreciate it if someone could answer...

YangGangZhiQi

about using the resnet models for cifar10

1

Hello Sir, I would like to start by saying how great this work is! And I would like to know if the resnet models can be used directly on cifar10...

EmnaGuermazi97

resnet structure seems to be a bit wrong

3

resnet use 7x7conv and maxpool In the beginning，but this rep uses 3x3 conv and no maxpool，is there any reason for doing this?

surprisedong

Failed to download the teacher models

2

Sorry!When i want to run fecthc_pretrained_tearhers.sh, I can't download the teacher model. Is this server down?

Prisoneryc

How do you choose the optimal hyper-parameters?

2

There are several hyper parameters existing: 1. teacher model hyper parameters 2. student model hyper parameters 3. KD hyper parameters (e.g., balance weight for different losses) 4. Training hyper parameters...

JinYang88

Problem of the order of the normalization in Similarity-Preserving loss.

In the paper for Similarity-Preserving loss. The normalization is before the operation of matrix Multiplication. Does the order matter the performance. ``` import torch org_f_s = torch.rand((64, 96)) org_f_t =...

seacj

RepDistiller
RepDistiller copied to clipboard

Metadata

Hyperparameter Settings for KD on Imagenet

Why using log_softmax instead of softmax?

Question about normalization constant Z_v1 and Z_v2 in the ContrastMemory

Ensemble Task Implementation

crd used in image enhancement task like Denoise\SR\Deblur.

about using the resnet models for cifar10

resnet structure seems to be a bit wrong

Failed to download the teacher models

How do you choose the optimal hyper-parameters?

Problem of the order of the normalization in Similarity-Preserving loss.

← Metadata

Owner

Metadata

RepDistiller RepDistiller copied to clipboard

Metadata

← Metadata

Owner

Metadata

RepDistiller
RepDistiller copied to clipboard