LLaVA
LLaVA copied to clipboard
Question about the object detection
When encoding the image to prompt, you mentioned captions and bounding boxes, I wonder which object detection model you utilized to generate the bounding boxes?
When encoding the image to prompt, you mentioned captions and bounding boxes, I wonder which object detection model you utilized to generate the bounding boxes?
I think the bounding boxes come from ground truth in coco dataset
Hi @Richar-Du both annotations come from the original COCO dataset: captions from coco-caption-2014 annotation, and boxes from coco-instances-2014 annotations.
Thanks @wanxinzzz for answering!
Got it, thanks for your explanations :)