Open-GroundingDino icon indicating copy to clipboard operation
Open-GroundingDino copied to clipboard

Predicted labels

Open Hasanmog opened this issue 1 year ago • 18 comments

Hello ,

@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified the evaluate function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?

Hasanmog avatar Dec 20 '23 16:12 Hasanmog

I'm not sure, but if everything is configured correctly, there may be something wrong with the code, My two suggestions are: 1. Check whether all parameters of the configuration file are correct. 2. Take a closer look at the evaluation code. There may be problems, but I am not sure.

Or if you have any new findings or logs, can you provide them to analyze the specific problems?

BIGBALLON avatar Dec 23 '23 07:12 BIGBALLON

Hello ,

@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified the evaluate function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?

I encountered the same problem. The mAP I got is low but the bounding boxes are quite good. Have you solved the problem?

Qia98 avatar Dec 26 '23 09:12 Qia98

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

Hasanmog avatar Dec 27 '23 07:12 Hasanmog

@longzw1997 Any suggestions? It looks like there may be a small problem but I don't know where it is.

  • The visual results are great, indicating that the training is effective
  • But evaluation mAP (including the predicted label) is low, it may be a problem with label_list or a problem with the evaluation code.

BIGBALLON avatar Dec 27 '23 10:12 BIGBALLON

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

longzw1997 avatar Dec 27 '23 12:12 longzw1997

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

yes.

Hasanmog avatar Dec 27 '23 15:12 Hasanmog

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

yes.

so the evaluation results normal now?

BIGBALLON avatar Dec 27 '23 15:12 BIGBALLON

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

Hasanmog avatar Dec 27 '23 15:12 Hasanmog

I find that the the evaluation result of coco is the sam whether using groundingdino_swint_ogc.pth or groundingdino_swinb_cogcoor.pth. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.552 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.709 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.610 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.401 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.407 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.706 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.784 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.638 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920

SamXiaosheng avatar Jan 04 '24 01:01 SamXiaosheng

@SamXiaosheng , I think it depends on the dataset you're using. If your dataset contains like referring expressions , you will find that Groundingdino_swinb performs better because it was trained on RefCOCO while the swint variant wasn't.

check this table

Hasanmog avatar Jan 06 '24 05:01 Hasanmog

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about [email protected] 0.90, map.5 :.95 0.70, but in training log [email protected] no more than 0.1).

Qia98 avatar Jan 08 '24 02:01 Qia98

And I suspect that in this code base, the output of the model is correct, but the _res_labels used to calculate the mAP are incorrect, so the problem may arise in the process of converting the output of the model to _res_labels in json file format. For example, xywh processing is incorrect.

Qia98 avatar Jan 08 '24 03:01 Qia98

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about [email protected] 0.90, map.5 :.95 0.70, but in training log [email protected] no more than 0.1).

Hi, @Qia98 I agree with your viewpoint and, if you find the time, please feel free to create a pull request to address this issue. 😄

BIGBALLON avatar Jan 08 '24 09:01 BIGBALLON

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

Hi, I also encounterd the same problem, when I visualize the detection results, I found that location of bounding boxes is correct, but the categories are usually incorrect. Is there something wrong with BERT or it is because of other reasons?

EddieEduardo avatar Jan 10 '24 03:01 EddieEduardo

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

junfengcao avatar Mar 12 '24 02:03 junfengcao

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄

BIGBALLON avatar Mar 12 '24 09:03 BIGBALLON

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄

May I ask if this problem has been solved by anyone? I've encountered the same problem and have troubleshot a number of situations but have not solved the problem.

jaychempan avatar Apr 07 '24 15:04 jaychempan

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄 This happened to me, too, with mAP of only 0.2%. When I changed the dataset, the accuracy was more than 40%, but why did the mAP decrease with training

caicaisy avatar Apr 15 '24 16:04 caicaisy