SparseInst icon indicating copy to clipboard operation
SparseInst copied to clipboard

run train_net.py eval function and demo.py get different results

Open wangshuailpp opened this issue 2 years ago • 21 comments

hi, I have trained with coco datasets(only extract 100 images for quickly test), and evaluate these 100 images to get AP(about 90). But when I run demo.py to view the mask of image, the result is very bad. Also, I have already run demo.py with the model your provide and get good mask. Thanks!

wangshuailpp avatar Jun 22 '22 12:06 wangshuailpp

Hi @wangshuailpp, thanks for your interest in SparseInst. Have you loaded the correct model or config when using demo.py to inference?

wondervictor avatar Jun 23 '22 06:06 wondervictor

I set the same model and config for test_net and demo. test_net gets good AP(about 90), but demo gets very bad mask of image. Thanks!

wangshuailpp avatar Jun 23 '22 06:06 wangshuailpp

Hi @wangshuailpp, it seems that you only train the model with 100 images from scratch and evaluated it on the training set. The model might overfit the 100 images. How about the images for visualization. Are these images are same as the images during testing?

wondervictor avatar Jun 23 '22 06:06 wondervictor

I sure that the test and the train are same images. But I found a strange phenomenon. The model size I train is all 400. 4M(from model_0004999.pth to model_final.pth), but the model size your provide is 133.7(sparse_inst_r50vd_dcn_giam_aug_67dc06.pth).

wangshuailpp avatar Jun 23 '22 07:06 wangshuailpp

Maybe, you need to check the trained model when using demo.py and check whether the weights are loaded correctly. As for the second problem, the model checkpoint contains model weights, states of the optimizer, and some necessary elements, which are stored for resuming training. We remove these useless states of the checkpoints for less space.

wondervictor avatar Jun 23 '22 08:06 wondervictor

Hi @wondervictor

I met the same problem. The training phase shows that the model achieves a high AP. But in the inference stage, it gets a bad performance. I trained coco/train_2017 and evaluated coco/val2017. I don't know why it gets such a high AP.

Backbone: the backbone of yolov5s contained FPN layer lr: 0.00001 batch size: 16

[06/27 09:03:49] d2.evaluation.coco_evaluation INFO: Evaluation results for segm: 
[06/27 09:03:47] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 10.19 seconds.
[06/27 09:03:47] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results...
[06/27 09:03:49] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 1.28 seconds.
[06/27 09:03:49] d2.evaluation.coco_evaluation INFO: Evaluation results for segm: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 49.467 | 87.500 | 48.812 | 32.919 | 49.825 | 69.343 |
[06/27 09:03:49] d2.evaluation.coco_evaluation INFO: Per-category segm AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 43.785 | bicycle      | 32.271 | car            | 42.777 |
| motorcycle    | 41.864 | airplane     | 53.666 | bus            | 64.478 |
| train         | 71.820 | truck        | 55.110 | boat           | 39.147 |
| traffic light | 41.711 | fire hydrant | 61.725 | stop sign      | 70.547 |
| parking meter | 65.880 | bench        | 38.417 | bird           | 31.598 |
| cat           | 75.043 | dog          | 66.143 | horse          | 41.768 |
| sheep         | 48.491 | cow          | 46.441 | elephant       | 59.619 |
| bear          | 78.057 | zebra        | 56.101 | giraffe        | 53.960 |
| backpack      | 36.774 | umbrella     | 47.956 | handbag        | 33.903 |
| tie           | 34.070 | suitcase     | 52.099 | frisbee        | 61.102 |
| skis          | 12.655 | snowboard    | 35.651 | sports ball    | 46.322 |
| kite          | 35.145 | baseball bat | 24.195 | baseball glove | 49.813 |
| skateboard    | 36.658 | surfboard    | 41.859 | tennis racket  | 52.250 |
| bottle        | 39.589 | wine glass   | 28.893 | cup            | 49.328 |
| fork          | 17.350 | knife        | 20.521 | spoon          | 21.177 |
| bowl          | 54.863 | banana       | 43.836 | apple          | 47.963 |
| sandwich      | 63.597 | orange       | 54.590 | broccoli       | 46.770 |
| carrot        | 40.702 | hot dog      | 50.465 | pizza          | 63.825 |
| donut         | 53.661 | cake         | 55.068 | chair          | 36.520 |
| couch         | 62.362 | potted plant | 43.600 | bed            | 76.547 |
| dining table  | 52.464 | toilet       | 72.657 | tv             | 66.814 |
| laptop        | 62.028 | mouse        | 60.983 | remote         | 38.988 |
| keyboard      | 63.289 | cell phone   | 43.698 | microwave      | 65.430 |
| oven          | 61.911 | toaster      | 70.000 | sink           | 55.822 |
| refrigerator  | 71.195 | book         | 26.437 | clock          | 64.107 |
| vase          | 51.646 | scissors     | 42.721 | teddy bear     | 59.757 |
| hair drier    | 49.020 | toothbrush   | 26.258 |                |        |
[06/27 09:03:49] d2.engine.defaults INFO: Evaluation results for coco/val2017 in csv format:
[06/27 09:03:49] d2.evaluation.testing INFO: copypaste: Task: segm
[06/27 09:03:49] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
[06/27 09:03:49] d2.evaluation.testing INFO: copypaste: 49.4665,87.5005,48.8123,32.9194,49.8253,69.3428

fabro66 avatar Jun 27 '22 01:06 fabro66

Hi @fabro66, could you provide a model config with weights for me? This problem is strange.

wondervictor avatar Jun 27 '22 02:06 wondervictor

Hi @fabro66, could you provide a model config with weights for me? This problem is strange.

OK!

model & configs & weights

Please check it!

fabro66 avatar Jun 27 '22 04:06 fabro66

Hi @fabro66, I've tested the model along with the weights through two scripts, i.e., train_net.py and test_net.py on my local machine (4 NVIDIA 3090 GPUs, PyTorch=1.9.1, cuda=11.1, detectron2=0.6)

  • train_net.py:
python train_net.py --config-file configs/instance/sparse_inst_y5s_giam.yaml --eval --num-gpus 4 MODEL.WEIGHTS sparseinst_y5s_backbone/weights/model_Final.pth 

outputs:

[06/27 14:54:39 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format: 
[06/27 14:54:39 d2.evaluation.testing]: copypaste: Task: segm 
[06/27 14:54:39 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl 
[06/27 14:54:39 d2.evaluation.testing]: copypaste: 49.4720,87.5011,48.7639,32.9183,49.8236,69.3405 
  • test_net.py:
python test_net.py --config-file configs/instance/sparse_inst_y5s_giam.yaml MODEL.WEIGHTS sparseinst_y5s_backbone/weights/model_Final.pth 

outputs:

[06/27 15:03:19 d2.evaluation.testing]: copypaste: Task: segm 
[06/27 15:03:19 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl 
[06/27 15:03:19 d2.evaluation.testing]: copypaste: 49.4670,87.4931,48.7567,32.9138,49.8194,69.3319 
speed: 0.0159s FPS: 62.78 

It seems that the results are normal through two ways of evaluation. Can you provide more details about the inconsistency between training and testing? BTW, I'm concerned about the pre-trained weights used in the provided backbone because it achieves promising results (49.5 mAP, 63 FPS). It's amazing.

wondervictor avatar Jun 27 '22 07:06 wondervictor

Hi @fabro66 and @wangshuailpp, I find that the demo.py outputs strange results and I'm going to fix it.

wondervictor avatar Jun 27 '22 07:06 wondervictor

It seems that the results are normal through two ways of evaluation. Can you provide more details about the inconsistency between training and testing? BTW, I'm concerned about the pre-trained weights used in the provided backbone because it achieves promising results (49.5 mAP, 63 FPS). It's amazing.


Hi @wondervictor The inconsistency I express here refers to the high AP in the training phase and bad performance in demo.py. I use pre-trained weights of the yolov5s backbone in coco dataset.

fabro66 avatar Jun 27 '22 07:06 fabro66

Hi @fabro66 and @wangshuailpp, I've fixed this problem! This problem is due to a mistake about INPUT_FORMAT, or exactly, "BGR" and "RGB". In demo.py, images are loaded in an RGB format in demo.py:L93 https://github.com/hustvl/SparseInst/blob/bd57455aa49c4cb37d66c77ccd477c7a5ebee444/demo.py#L93 but it's converted to the BGR format in detectron2/engine/defaults.py:L311 https://github.com/facebookresearch/detectron2/blob/224cd2318fdb45b5e22bbb861ee9711ee52c8b75/detectron2/engine/defaults.py#L311 which is a wrong step. To solve it, you can add another convension by:

predictions = self.predictor(image[:,:,::-1])

in sparse_inst/d2_predictor.py:L49.

And I'll update the code to fix this bug.

wondervictor avatar Jun 27 '22 08:06 wondervictor

Hi @wondervictor When I fixed the input format bug, it still gets bad performance. Do you know where the problem is? I'm guessing that the model is overfitting because it uses pre-trained weights of the yolov5s backbone trained in the coco dataset.

fabro66 avatar Jun 27 '22 08:06 fabro66

Hi @fabro66, the model you've trained achieves 49.5 AP on COCO val2017 and should perform well on the images. From the evaluation results, it seems the problem is not due to the overfitting. Have you compared the results before/after changing the image format?

wondervictor avatar Jun 27 '22 09:06 wondervictor

Hi @wondervictor. I have compared the results before/after changing the image format. It will get better performance than before changing the image format. However, it still gets bad performance in coco_val_2017 dataset.

image

image

image

image

image

image

image

fabro66 avatar Jun 27 '22 09:06 fabro66

Hi @wondervictor .

As long as I keep training sparseinst-yolonet, the AP keeps improving. For the validation dataset, some images are segmented well, even some people of small size. In some images, even if the person is standing in a large area in the image, the performance is bad. Is there something wrong with my hyperparameters? Could you help reproduce sparseinst-yolonet?

image

image

image

fabro66 avatar Jun 29 '22 01:06 fabro66

Hi @fabro66, I'd like to solve this problem while it takes a little time now. I'm working on it. If you make any progress, please feel free to mention me in this issue : )

wondervictor avatar Jun 29 '22 10:06 wondervictor

Hi @fabro66, I've evaluated the visualization results on the training set and the results are also bad. This problem is much weird. The AP on val2017 is 49.5, which is much higher than that using ResNet-50. However, the visualization results are worse than the ResNet-based models. @fabro66, could you run the visualization results with the pretrained models with ResNet-50 to check whether the scripts in your environment work well. I'm going to re-train the SparseInst with the yolo model. I'll notify you if I achieve any progress : )

wondervictor avatar Jul 10 '22 07:07 wondervictor

Hi @wondervictor. I used the pretrained model with ResNet-50 for visualization on val2017 and got good performance, which confirms that my environment is ok. I don't sure whether my batch size is set too small (16 in my experiments). Looking forward to your new progress!

fabro66 avatar Jul 10 '22 15:07 fabro66

Hi @fabro66 I am going to use yolov5s as backbone. Would you please share your configs and models. The above link is dead. I will test it on my machine as well. Thanks.

xjsxujingsong avatar Jul 13 '22 13:07 xjsxujingsong

Hi @fabro66 I am going to use yolov5s as backbone. Would you please share your configs and models. The above link is dead. I will test it on my machine as well. Thanks. I also want to try it,wait. Thanks

116022017144 avatar Mar 01 '23 09:03 116022017144