DDQ icon indicating copy to clipboard operation
DDQ copied to clipboard

mmdet

Open Chop1 opened this issue 1 year ago • 10 comments

Hello,

will this be integrated in the official mmdet library? Thanks

Chop1 avatar Jun 13 '23 12:06 Chop1

Due to my involvement in other heavy-load projects, it may require assistance from the community to integrate DDQ into mmdet

jshilong avatar Jun 13 '23 13:06 jshilong

Hello, I have attempted to integrate ddqdetr into mmdet3.0.0 and it can run successfully. However, in the subsequent improvement process, due to some sudden interruptions during the training process, the error code is as follows. I did not find the reason for this and would like to ask if anyone else has experienced similar situations. 1a78a650bee3f6f9662ce70101ed4ea b2d609611273b7bec021a3f76577280

ironmanfcf avatar Jun 22 '23 01:06 ironmanfcf

These errors might be related to running a different branch or encountering a corner case. You can check all logical branches before this error. Also, I would like to know if the code in this repository can run without any issues. If DDQ-DETR in this repo is running smoothly, you should pay more attention to your modifications.

jshilong avatar Jun 22 '23 02:06 jshilong

    Yes, only after some splitting and referencing modifications were made to the code, ddq detr can run smoothly in this repository. Meanwhile, I am  also using mmcv2.0.0rc4 but  under pytorch1.12.1+cuda113. 
    
In addition, I reduced the number of queries and applied ddq-detr to my own tasks which has higher density instances. I suspect it may have an impact on the subsequent one-to-many allocation of auxiliary losses calculation. I think this may be the reason for the training interruption
    
        
    

     
    
        
                    ***@***.***
        
    
 
    

---- Replied Message ----



  
    
     From 
    
    
        Shilong ***@***.***>
        
    
  
  
    
     Date 
    
    
    6/22/2023 10:45
    
  
  
    
     To 
    
    
     
      
        ***@***.***>
        
      
    
  
  
    
     Cc 
    
    
      
        ***@***.***>
        ,
      
      
        ***@***.***>
        
      
    
  
  
    
     Subject 
    
    
          Re: [jshilong/DDQ] mmdet (Issue #12)
    
  

These errors might be related to running a different branch or encountering a corner case. You can check all logical branches before this error. Also, I would like to know if the code in this repository can run without any issues. If DDQ-DETR in this repo is running smoothly, you should pay more attention to your modifications.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

ironmanfcf avatar Jun 22 '23 03:06 ironmanfcf

Yes, only after some splitting and referencing modifications were made to the code, ddq detr can run smoothly in this repository. Meanwhile, I am also using mmcv2.0.0rc4 but under pytorch1.12.1+cuda113. In addition, I reduced the number of queries and applied ddq-detr to my own tasks which has higher density instances. I suspect it may have an impact on the subsequent one-to-many allocation of auxiliary losses calculation. I think this may be the reason for the training interruption @.*** ---- Replied Message ---- From Shilong @.> Date 6/22/2023 10:45 To @.> Cc @.> , @.> Subject Re: [jshilong/DDQ] mmdet (Issue #12) These errors might be related to running a different branch or encountering a corner case. You can check all logical branches before this error. Also, I would like to know if the code in this repository can run without any issues. If DDQ-DETR in this repo is running smoothly, you should pay more attention to your modifications. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Maybe you can directly remove the auxiliary loss of the decoder to verify that the code can run smoothly,

jshilong avatar Jun 22 '23 03:06 jshilong

If there are an extremal large number of instances in an image, please make sure the number of queries is enough to match the gt, due to the aux branch using 4 positive samples for each gt, there should be at least max_gt*4 queries.

jshilong avatar Jun 22 '23 03:06 jshilong

Hello, during the experiment, I found that compared to DINO, DDQDETR has significantly increased training time and memory usage. By looking at the code and experimenting, I found that the main improvements that may cause the above problems should include: auxiliary detection head branches and auxiliary loss calculation, as well as nms operations added in the query selection stage and decoder stage.I would like to ask: 1. Are there any other factors that I haven't noticed that can increase graphics memory and cause training to slow down?2. May I ask which part or parts of the training time computation and  memory are mainly from, and what is the approximate relationship between the burden and the increased accuracy?In the paper, I noticed that you discussed the impact of changing the number of qureys on video memory usage in appendix, but I did not notice the impact of other improvements on the training process. Due to hardware limitations, these issues are very helpful for my subsequent experimental setup.Message ID: ***@***.***>

ironmanfcf avatar Jun 23 '23 01:06 ironmanfcf

The longer training time and higher memory usage are due to the calculation of 1.5*900 = 1350 auxiliary queries and their corresponding label assignment process. If this is causing a heavy burden, you may consider removing the auxiliary branch in your experiment

jshilong avatar Jun 23 '23 03:06 jshilong

Both the training time and memory only increase by around 5%-10% compared to DINO. I have also encountered a heavy slowdown with around 2x as much time per iteration as DINO, which was later found to be caused by the low version of scipy. In some versions of scipy, e.g. (scipy==1.5), the maximum_bipartite_matching function would take an unexpectedly too long time. You may try to install scipy>=1.7.3 if your problem is similar.

Johnson-Wang avatar Sep 07 '23 13:09 Johnson-Wang

        Thank you very much. I will try and reply to you later
    

    
        
    

     
    
        
                    ***@***.***
        
    
 
    

---- Replied Message ----



  
    
     From 
    
    
        Wang ***@***.***>
        
    
  
  
    
     Date 
    
    
    9/7/2023 21:01
    
  
  
    
     To 
    
    
     
      
        ***@***.***>
        
      
    
  
  
    
     Cc 
    
    
      
        ***@***.***>
        ,
      
      
        ***@***.***>
        
      
    
  
  
    
     Subject 
    
    
          Re: [jshilong/DDQ] mmdet (Issue #12)
    
  

Both the training time and memory only increase by around 5%-10% compared to DINO. I have also encountered a heavy slowdown with around 2x as much time per iteration as DINO, which was later found to be caused by the low version of scipy. In some versions of scipy, e.g. (scipy==1.5), the maximum_bipartite_matching function would take an unexpectedly too long time. You may try to install scipy>=1.7.3 if your problem is similar.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

ironmanfcf avatar Sep 07 '23 13:09 ironmanfcf

https://github.com/open-mmlab/mmdetection/tree/main/configs/ddq It has been merged into official mmdetection repo

jshilong avatar Sep 13 '24 13:09 jshilong