LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

Question about the object detection

Open Richar-Du opened this issue 1 year ago • 1 comments

When encoding the image to prompt, you mentioned captions and bounding boxes, I wonder which object detection model you utilized to generate the bounding boxes?

Richar-Du avatar Apr 20 '23 02:04 Richar-Du

When encoding the image to prompt, you mentioned captions and bounding boxes, I wonder which object detection model you utilized to generate the bounding boxes?

I think the bounding boxes come from ground truth in coco dataset

wanxinzzz avatar Apr 20 '23 03:04 wanxinzzz

Hi @Richar-Du both annotations come from the original COCO dataset: captions from coco-caption-2014 annotation, and boxes from coco-instances-2014 annotations.

Thanks @wanxinzzz for answering!

haotian-liu avatar Apr 21 '23 00:04 haotian-liu

Got it, thanks for your explanations :)

Richar-Du avatar Apr 21 '23 07:04 Richar-Du