Simple Question about object detection code

Open AlexCo1d opened this issue 1 year ago • 1 comments

https://github.com/doc-doc/CoVGT/blob/cbc9fa7830b304f3c3f9c53040489ea9ad35a9aa/model/EncoderVid.py#L56-L71

Hi, I am reading your code about object detection. I found the above one in your EncoderVid.py Do you still remember why you choose 5 dimension (dim_bbox) for positional embedding? What is the source of this way? (Faster RCNN or Detectron)

Thank you for your prompt response! Thanks for your great work!

Jan 05 '25 17:01 AlexCo1d

Hi, the fifth dimension denotes the relative bbox size: bbox_size/image_size(w*h), it is basically based on my previous relation grounding work :https://github.com/doc-doc/vRGV.

Feb 23 '25 09:02 doc-doc