CoVGT icon indicating copy to clipboard operation
CoVGT copied to clipboard

Simple Question about object detection code

Open AlexCo1d opened this issue 1 year ago • 1 comments

https://github.com/doc-doc/CoVGT/blob/cbc9fa7830b304f3c3f9c53040489ea9ad35a9aa/model/EncoderVid.py#L56-L71

Hi, I am reading your code about object detection. I found the above one in your EncoderVid.py Do you still remember why you choose 5 dimension (dim_bbox) for positional embedding? What is the source of this way? (Faster RCNN or Detectron)

Thank you for your prompt response! Thanks for your great work!

AlexCo1d avatar Jan 05 '25 17:01 AlexCo1d

Hi, the fifth dimension denotes the relative bbox size: bbox_size/image_size(w*h), it is basically based on my previous relation grounding work :https://github.com/doc-doc/vRGV.

doc-doc avatar Feb 23 '25 09:02 doc-doc