CoVGT
CoVGT copied to clipboard
Simple Question about object detection code
https://github.com/doc-doc/CoVGT/blob/cbc9fa7830b304f3c3f9c53040489ea9ad35a9aa/model/EncoderVid.py#L56-L71
Hi, I am reading your code about object detection. I found the above one in your EncoderVid.py Do you still remember why you choose 5 dimension (dim_bbox) for positional embedding? What is the source of this way? (Faster RCNN or Detectron)
Thank you for your prompt response! Thanks for your great work!
Hi, the fifth dimension denotes the relative bbox size: bbox_size/image_size(w*h), it is basically based on my previous relation grounding work :https://github.com/doc-doc/vRGV.