RCIL
RCIL copied to clipboard
Reproduce ADE20k
Hi, thanks for sharing the code.
I'm trying to reproduce the results for 100-50 ADE20k.
Here are the hyper-parameters I used:
--pod local --pod_factor 0.001 --pod_logits --classif_adaptive_factor --init_balanced --unce --unkd
I get the all-mIoU=29.4%, which is much lower than the reported mIoU (34.5%). Could you please share with me the parameters you used to get the reported mIoU?
I don't think the authors have released the part for ADE20k. I notice that the --pod_factor 0.001
is in vain since it is directly set within the function as below. However, we do not know the correct value for ADE20k. Hoping the authors will address this soon.
https://github.com/zhangchbin/RCIL/blob/74026b2125f76d145ddf2aa1325386f6834bf315/train.py#L64-L69
https://github.com/zhangchbin/RCIL/blob/74026b2125f76d145ddf2aa1325386f6834bf315/train.py#L105-L109
@HieuPhan33 I adopt the same value for pod_factor
as in PLOP. Note that RCIL does not use features_distillation_channel for the small logits (1/16 in scale). Therefore, I give my solution below, not sure will work, still validating.
- try replacing https://github.com/zhangchbin/RCIL/blob/74026b2125f76d145ddf2aa1325386f6834bf315/train.py#L64-L69 with
if i == len(list_attentions_a) - 1:
# pod_factor = 0.0001 # cityscapes
# pod_factor = 0.0005 # voc
pod_factor = 0.00001 # ade
else:
# pod_factor = 0.0005 # cityscapes
# pod_factor = 0.01 # voc
pod_factor = 0.001 # ade
- replace https://github.com/zhangchbin/RCIL/blob/74026b2125f76d145ddf2aa1325386f6834bf315/train.py#L105-L109 with
# pod_factor = 0.0005 # cityscapes
# pod_factor = 0.01 # voc
pod_factor = 0.001 # ade
Hi, sorry for the late reply, we are still not in the lab due to the covid, so we can't sort out the script files of ade in time. However, I can offer you a choice of the factor for ADE20K. In that case, you can adjust the factor to (0.000005, 0.000005), which may be able to achieve similar performance. I'll update them as soon as I get back to the lab. Sorry for that.
Let me know if it doesn't work.
Hi, may I know the batch size you use? I am running on 2x RTX3090. If I keep the batch size 12, it will run out of memory. Additionally, what learning rate do you use if you reduce the batch size? Thanks.
I remember that we use 8x 1080ti to train the model on the ADE20K, but I'm not sure the batch size. @schuy1er
Hi, sorry for the late reply, we are still not in the lab due to the covid, so we can't sort out the script files of ade in time. However, I can offer you a choice of the factor for ADE20K. In that case, you can adjust the factor to (0.000005, 0.000005), which may be able to achieve similar performance. I'll update them as soon as I get back to the lab. Sorry for that.
Let me know if it doesn't work.
Hi, what do you mean (0.000005, 0.000005)? set the factor to 0.000005 for the small logits map (1/16 in scale) only as shown below? What's the factor for intermediate layers distillation including both channel-wise and spatial KD?
if i == len(list_attentions_a) - 1:
pod_factor = 0.000005 # ade
else:
pod_factor = 0.001 # ade
I obtained the results as below on ADE20k. pod_factor
set as 0.000005 for the small logits (1/16 in scale). kd_loss
set as 100.
100-50 | 100-10 | 50-50 | |||||||
---|---|---|---|---|---|---|---|---|---|
1-100 | 101-150 | all | 1-100 | 101-150 | all | 1-50 | 51-150 | all | |
Reported | 42.3 | 18.8 | 34.5 | 39.3 | 17.6 | 32.1 | 48.3 | 25.0 | 32.5 |
Reproduced | 42.4 | 13.3 | 32.7 | 36.2 | 13.2 | 28.3 | 47.4 | 18.8 | 28.3 |
Hi @Ze-Yang, can you reproduce the results on 100-50, 100-10 and 100-5? If so, can you share with me the hyper-parameters?
Hi @HieuPhan33, unfortunately, I cannot reproduce the results for all settings on ADE20k.
Hi, @HieuPhan33 @Ze-Yang I'm sorry for the late reply. We have fixed bugs in the repository, and you can find the re-implemented results on VOC and ADE20K. If there is any questions, please feel easy to let us know. Thanks for your help.