About Oscar/Oscar+ model
How to use custom data to Visual Grounding Task?
Sorry for replying late. We currently do not provide an easy-to-use interface for custom data. You can try to convert it into the Refcoco format. The basic idea is to download the refcoco splits and try to add new config and data files for your dataset like the existing ones.
The details for different data files: yamls/xxx.yaml: is the main config file. split/xxx.json: contains the image name and text query. Do make sure to include "file_name", "height", "width", "id", and "caption". detections/xxx/dets.json: is the candidate boxes or segmentation masks. Please make sure the box coordinates are [x0, y0, w, h].
For your image dir: Please rename every image as xxx_id.jpg.
Thanks for your reply. I didn't know much about Visual Grounding task. So I want to make sure how to abtain the candidate boxes about "detections/xxx/dets.json". From detector or the preordained information?I would appreciate your reply.
The candidates can be obtained from any object detector, like Faster-RCNN. You can also use the feature extractor of VinVL, which is the prompt_feat's code. To use VinVL detector, you need provide a img_info.json file, which is a python dictionary like {"xxx.jpg": {"width": 500, "height": 333}, ...}. Then you can use the prompt_feat/cmds/gqa/_ext.sh and modify DATA_DIR, OUTPUT_DIR to your desired dirs. After that, please run python tools/ext_objects.py OUTPUT_DIR/predictions.tsv DETECTION_FILE.