Minxuan Qin

Results 5 comments of Minxuan Qin

Thank you for your reply! So you have distilled a lightweight image encoder with only 6 layers, where the first two layers does not contain attention layers. For the inference,...

Thank you for your quick reply! I am not familiar to flash attention, so it is maybe a silly question: Based on your answer and the code I think flash...

I have another question regarding to the distillation process: From `utils/prepare_nnunet.py` the images and labels from one dataset shall be stored under `label_name/dataset_name/imagesTr` and `label_name/dataset_name/labelsTr`, but `preparelabel.py` and `validation.py` only...

I have a question regarding to the distillation loss. From the paper, the objective of the layer-wise progressive distillation process is described as $$E_x (\frac{1}{k} \sum_{i=1}^{k} \Vert f_{teacher}^{(2i)} (x) -...