LLaVA Question about the object detection

Question about the object detection

Open Richar-Du opened this issue 1 year ago • 1 comments

When encoding the image to prompt, you mentioned captions and bounding boxes, I wonder which object detection model you utilized to generate the bounding boxes?

Apr 20 '23 02:04 Richar-Du

When encoding the image to prompt, you mentioned captions and bounding boxes, I wonder which object detection model you utilized to generate the bounding boxes?

I think the bounding boxes come from ground truth in coco dataset

Apr 20 '23 03:04 wanxinzzz

Hi @Richar-Du both annotations come from the original COCO dataset: captions from coco-caption-2014 annotation, and boxes from coco-instances-2014 annotations.

Thanks @wanxinzzz for answering!

Apr 21 '23 00:04 haotian-liu

Got it, thanks for your explanations :)

Apr 21 '23 07:04 Richar-Du

LLaVA LLaVA copied to clipboard

Question about the object detection

LLaVA
LLaVA copied to clipboard