CLIMS
CLIMS copied to clipboard
How to obtain pre-trained baseline CAM
Hi, I have a question about how to obtain the pre-trained baseline CAM (res50_cam.pth).
Currently this repo directly provides the checkpoint, and I'd like to know how this model is trained. Could you please explain?
And if I want to use CLIMS on a custom dataset, do I need to retrain the res50_cam.pth based on my custom dataset? Thanks!
Hi, setting train_cam_pass=True, train_clims_pass=False to train get baseline cam. It is just supervised by a multi-label classification loss.
I see, so if I want to use CLIMS on my custom dataset, I will need to re-train the res50_cam.pth model with a multi-label classification loss, since the provided checkpoint is trained only on the PASCAL VOC 2012, right?
Exactly. Better with pre-training,
Thanks for the explanation! I will try this.
Hi, I trained the res50_cam.pth on the PASCAL VOC dataset, and used it to train the CLIMS on PASCAL VOC. The training process went well. But the CLIMS performance that uses the res50_cam.pth trained by myself is worse than the one that uses the res50_cam.pth provided by you, i.e., 55.38% vs. 58.27% on mIoU.
My training command for training the res50_cam.pth is:
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root data/VOC2012/ --hyper 10,24,1,0.2 --cam_eval_thres 0.15 --work_space clims_voc12_for_res50_cam --cam_network net.resnet50_clims --train_cam_pass True --train_clims_pass False --make_clims_pass False --eval_cam_pass False
The log for training res50_cam.pth is train_res50_cam_log.txt, The log for training CLIMS with my res50_cam.pth is train_clims_my_res50_cam.txt. The log for training CLIMS with the provided res50_cam.pth is train_clims_provided_res50_cam.txt. Could you check, thanks!
How about the performance of your 'res50_cam.pth' on train set?
I directly copy the mIoU reported in the train_clims_my_res50_cam.txt and train_clims_provided_res50_cam.txt, from the step.eval_cam part. I'm not sure whether the mIoU is from training set or val set.
For your reference, I have shared my res50_cam.pth here. Hope this is helpful for the question.
You can evaluate your res50_cam.pth on the 'train' set to get mIoU. The scripts are similar to evaluate CLIMS.
For evaluation, I guess you mean the step/eval_cam.py file. This file requires the cam result as input, which I guess is generated by step/make_cam.py file. So I should use step/make_cam.py to first generate cam result with my res50_cam.pth (without CLIMS), then use step/eval_cam.py to evaluate on the train set , is that right?
exactly.
Hi, I just evaluated my res50_cam.pth (without CLIMS) on the train set, and the mIoU is 47.89%. Could you have a look?
My command for generating CAM using my res50_cam.pth:
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root data/VOC2012/ --hyper 10,24,1,0.2 --work_space clims_voc12_for_res50_cam_eval --cam_network net.resnet50_cam --make_cam_pass True
My command for evaluating the CAM of my res50_cam.pth on train set:
CUDA_VISIBLE_DEVICES=0 python run_sample.py --voc12_root data/VOC2012/ --hyper 10,24,1,0.2 --work_space clims_voc12_for_res50_cam_eval --eval_cam_pass True
Could you evaluate my 'res50_cam.pth' on train set? Maybe the mIoU of the initial model will help.
Hi, I have evaluated the res50_cam.pth (without CLIMS) model provided by you on the train set, and the mIoU is 48.11%.
The evaluation log for my res50_cam.pth (without CLIMS):
{'num_workers': 12, 'voc12_root': 'data/VOC2012/', 'train_list': 'voc12/train_aug.txt', 'val_list': 'voc12/val.txt', 'infer_list': 'voc12/train_aug.txt', 'chainer_eval_set': 'train', 'cam_network': 'net.resnet50_cam', 'feature_dim': 2048, 'cam_crop_size': 512, 'cam_batch_size': 16, 'cam_num_epoches': 5, 'cam_learning_rate': 0.1, 'cam_weight_decay': 0.0001, 'cam_eval_thres': 0.15, 'cam_scales': (1.0, 0.5, 1.5, 2.0), 'num_cores_eval': 8, 'clims_network': 'net.resnet50_clims', 'clims_num_epoches': 15, 'clims_learning_rate': 0.00025, 'hyper': '10,24,1,0.2', 'clip': 'ViT-B/32', 'conf_fg_thres': 0.3, 'conf_bg_thres': 0.1, 'irn_network': 'net.resnet50_irn', 'irn_crop_size': 512, 'irn_batch_size': 32, 'irn_num_epoches': 3, 'irn_learning_rate': 0.1, 'irn_weight_decay': 0.0001, 'beta': 10, 'exp_times': 8, 'sem_seg_bg_thres': 0.2, 'work_space': 'clims_voc12_for_res50_cam_eval', 'log_name': 'clims_voc12_for_res50_cam_eval/sample_train_eval', 'cam_weights_name': 'clims_voc12_for_res50_cam_eval/res50_cam.pth', 'irn_weights_name': 'clims_voc12_for_res50_cam_eval/res50_irn.pth', 'cam_out_dir': 'clims_voc12_for_res50_cam_eval/cam_mask', 'ir_label_out_dir': 'clims_voc12_for_res50_cam_eval/ir_label', 'sem_seg_out_dir': 'clims_voc12_for_res50_cam_eval/sem_seg', 'ins_seg_out_dir': 'clims_voc12_for_res50_cam_eval/ins_seg', 'clims_weights_name': 'clims_voc12_for_res50_cam_eval/res50_clims', 'train_cam_pass': False, 'train_clims_pass': False, 'make_cam_pass': False, 'make_clims_pass': False, 'eval_cam_pass': True, 'cam_to_ir_label_pass': False, 'train_irn_pass': False, 'make_ins_seg_pass': False, 'eval_ins_seg_pass': False, 'make_sem_seg_pass': False, 'eval_sem_seg_pass': False}
step.eval_cam: Thu Feb 23 20:45:37 2023
{'aeroplane': 0.3878330494895696, 'bicycle': 0.277048223235467, 'bird': 0.4008208719410258, 'boat': 0.302372916148759, 'bottle': 0.46038248257484954, 'bus': 0.6441010496947992, 'car': 0.52437011459361, 'cat': 0.5810356518108325, 'chair': 0.2503940042009332, 'cow': 0.5563700992609669, 'dining table': 0.41537679467596306, 'dog': 0.5097696192448804, 'horse': 0.52340754056787, 'motorbike': 0.6070907393004206, 'player': 0.48949676891373317, 'potted plant': 0.4091623130159273, 'sheep': 0.5904093659220163, 'sofa': 0.472986569753616, 'train': 0.4686553897256759, 'tv monitor': 0.41682748577263534}
threshold: 0.15 miou: 0.4789238743955983 i_imgs 1464
among_predfg_bg 0.36928326345424084
The evaluation log for the res50_cam.pth (without CLIMS) model provided by you:
{'num_workers': 12, 'voc12_root': 'data/VOC2012/', 'train_list': 'voc12/train_aug.txt', 'val_list': 'voc12/val.txt', 'infer_list': 'voc12/train_aug.txt', 'chainer_eval_set': 'train', 'cam_network': 'net.resnet50_cam', 'feature_dim': 2048, 'cam_crop_size': 512, 'cam_batch_size': 16, 'cam_num_epoches': 5, 'cam_learning_rate': 0.1, 'cam_weight_decay': 0.0001, 'cam_eval_thres': 0.15, 'cam_scales': (1.0, 0.5, 1.5, 2.0), 'num_cores_eval': 8, 'clims_network': 'net.resnet50_clims', 'clims_num_epoches': 15, 'clims_learning_rate': 0.00025, 'hyper': '10,24,1,0.2', 'clip': 'ViT-B/32', 'conf_fg_thres': 0.3, 'conf_bg_thres': 0.1, 'irn_network': 'net.resnet50_irn', 'irn_crop_size': 512, 'irn_batch_size': 32, 'irn_num_epoches': 3, 'irn_learning_rate': 0.1, 'irn_weight_decay': 0.0001, 'beta': 10, 'exp_times': 8, 'sem_seg_bg_thres': 0.2, 'work_space': 'clims_voc12_for_res50_cam_eval_ori_model', 'log_name': 'clims_voc12_for_res50_cam_eval_ori_model/sample_train_eval', 'cam_weights_name': 'clims_voc12_for_res50_cam_eval_ori_model/res50_cam.pth', 'irn_weights_name': 'clims_voc12_for_res50_cam_eval_ori_model/res50_irn.pth', 'cam_out_dir': 'clims_voc12_for_res50_cam_eval_ori_model/cam_mask', 'ir_label_out_dir': 'clims_voc12_for_res50_cam_eval_ori_model/ir_label', 'sem_seg_out_dir': 'clims_voc12_for_res50_cam_eval_ori_model/sem_seg', 'ins_seg_out_dir': 'clims_voc12_for_res50_cam_eval_ori_model/ins_seg', 'clims_weights_name': 'clims_voc12_for_res50_cam_eval_ori_model/res50_clims', 'train_cam_pass': False, 'train_clims_pass': False, 'make_cam_pass': False, 'make_clims_pass': False, 'eval_cam_pass': True, 'cam_to_ir_label_pass': False, 'train_irn_pass': False, 'make_ins_seg_pass': False, 'eval_ins_seg_pass': False, 'make_sem_seg_pass': False, 'eval_sem_seg_pass': False}
step.eval_cam: Fri Feb 24 00:39:27 2023
{'aeroplane': 0.43776330336844194, 'bicycle': 0.2898690744920993, 'bird': 0.42104697295496535, 'boat': 0.35934724517947014, 'bottle': 0.4474912544130493, 'bus': 0.6073325975418234, 'car': 0.5224978877557286, 'cat': 0.4224395073545029, 'chair': 0.27233237467560495, 'cow': 0.5737923737175804, 'dining table': 0.38645536061061486, 'dog': 0.4528349697951786, 'horse': 0.5047025024011766, 'motorbike': 0.6040080714806751, 'player': 0.5333809823476624, 'potted plant': 0.44080811162720984, 'sheep': 0.622085347911287, 'sofa': 0.44851732806495437, 'train': 0.5151142389572412, 'tv monitor': 0.44854477412047417}
threshold: 0.15 miou: 0.4811906300786135 i_imgs 1464
among_predfg_bg 0.3087233065778073
Hi, sorry for the late reply. The mIoU of reproduced baseline cam is lower than that of this reponsitory, e.g, aeroplane and boat. I'm not sure whether it is the reason for the lower performance of CLIMS you reproduced. I will try to re-train baseline CAM and then train the CLIMS.