Fail to reproduce the results of GZLSS / GFLSS on PASCAL VOC
I have trouble in setting up the experimental environment on the public server, so I use a stronger baseline that can reach mIoU 59.8 after fine-tuning on novel classes with five samples in each class.
However, the replacement with class label embeddings (w2c+ Fasttext) cannot reproduce the result reported in the Figure5(b) (about 75 mIoU) but yields close results (60.1 mIoU) to baseline even after fine-tuning on novel class and calibration parameter tuning (threshold=0.5 works best in my case).
Has anyone successfully reproduced the GFLSS results on PASCAL VOC using DeeplabV2 or other base frameworks?
I got 48.8% mIoU on ZLSS setting on PASCAL VOC. It's almost the same as 49.5% on their paper. I did not try GFLSS setting.
@qiang92 I also got close results around 49 on the ZLSS setting but failed to reproduce the GFLSS setting. Replacing the normal weights with the word embeddings does not improve the performance.
@qiang92 You can simply replace the weights of the last layer of Deeplab with the word embeddings and train + finetune to try to reproduce the results. During inference, use the threshold to alleviate the bias towards the base classes as shown in the code of SPNet. However, this did not work for me.
@happycoding1996 I cannot reproduce their GZLSS results neither in voc12 or COCO datasets. I got max mIoU H 29.35 in GZLSS setting on VOC which is far from their result 42.45.
@qiang92 Have you tried a threshold to make the score of base classes small? The calibration can significantly boost the model's performance on novel classes. I did not reproduce the GZLSS results, but my reproduced GFLSS results were also far behind the results reported in the paper.
@subhc Could you elaborate more experimental details about reproducing the results of GFLSS and GZLSS? Thanks!
Hi, I am not sure why this is happening, I believe the configs given in config folder should be sufficient, These are the results from when I last ran this codebase, voc12_json.zip. You can find the checkpoint files here.
@subhc Could you elaborate more experimental details about reproducing the results of GFLSS and GZLSS? Thanks!
Hi, I am not sure why this is happening, I believe the configs given in config folder should be sufficient, These are the results from when I last ran this codebase, voc12_json.zip. You can find the checkpoint files here.
Thank you for your sharing. But I cannot get your calibrated GZLSS results by using the checkpoint you provided.
I can get the same ZLSS and GZLSS results written in your jsons by using your checkpoint_20000.pth.tar file.
But when I want to calibrate GZLSS results with thresholds, I tried thresholds from 0.1-0.95. I cannot get the results written in your paper which is a harmonic mean of 47.45. My best try is a harmonic mean of 28.18 for threshold=0.9 using the checkpoint you provided.
I use the code written your run_pascal.sh file to run the python code.
CUDA_VISIBLE_DEVICES=$1 python eval.py --config config/voc12/ZLSS.yaml --imagedataset voc12 --model-path logs/voc12/myexp/checkpoint_20000.pth.tar -r gzlss --threshold 0.6
Could you tell me how to get your calibrated GZLSS results?
Thank you again.
@happycoding1996 @qiang92 I meet the same questions! I cannot get thecalibrated GZLSS results by using the checkpoint author provided. Could you tell me how to get the true calibrated GZLSS results and the true threshold?
@sunycl I still fail to get the reported results with a significant performance gap.
@sunycl I hadn't made any progress in this project so far because the best results I reproduced were not good as claimed in previous comments. I have given up.
@subhc @happycoding1996 @qiang92 The checkpoint link appears to have expired. Could you share the checkpoint again. Thanks.
@happycoding1996 hello, I can't even get a harmonic mean of 28.18 . How many gpus did you use? What is the batch size? and can you provide the init model for PASCAL VOC? Thank you very much!