all-seeing
all-seeing copied to clipboard
Where do the bounding boxes used in creating the AS-V2 dataset come from?
Thank you for the excellent work on ASMv2. In the paper, you mention that when creating the AS-V2 dataset, the bounding boxes of objects are used as part of the prompt for GPT-4V. However, the process of obtaining these bounding boxes wasn't explained. Could you describe the workflow for acquiring the bounding boxes?