BriVL new image bbox

how to get 'bbox' in BriVL/BriVL-code-inference/data/jsonls/example.jsonl

Aug 27 '21 07:08 21157651

The bbox of these examples have 100 ROI. How to use FasterRCNN to detect these crazy numbers of Objects?

Aug 31 '21 07:08 knaffe

I used Detectron2 with weight mask_rcnn_R_50_FPN_3x.yaml to get 100 candidate bboxes, but the coordinates are not exactly the same as that in the example.jsonl. So I'd like to know if the object detector used in the project could be provided for complete reproduction.

Sep 01 '21 02:09 MischaQI

BriVL uses the Bottom-Up Attention model as its object detection tool, this model can be obtained fromBriVL-BUA-applications

Sep 01 '21 03:09 chuhaojin

By the way, I have test the AIC-ICC Validset from BriVL-API1.0, but the retrieval result is too low (Recall@1 < 1%). I use your released code for 'retrieval ' and Faiss vector retrieval, but still have disappointing result. Could you release more details in this Exp of Paper?

Sep 01 '21 03:09 knaffe

BriVL uses the Bottom-Up Attention model as its object detection tool, this model can be obtained fromBriVL-BUA-applications

hi, I used BriVL-BUA-applications to get the bboxes. I modified the extract-bua-caffe-r101.yaml file. The MAX_BOXES is changed from 45 to 100, but the coordinates are not exactly the same as that in the example.jsonl. So I am a little confused. I don't know what is wrong during using the BriVL-BUA-applications. Can you give a example from the example.jsonl and generate bboxes of it by using BriVL-BUA-applications. Thank you very much!

Sep 03 '21 03:09 zgj-gutou

BriVL uses the Bottom-Up Attention model as its object detection tool, this model can be obtained fromBriVL-BUA-applications

hi, I used BriVL-BUA-applications to get the bboxes. I modified the extract-bua-caffe-r101.yaml file. The MAX_BOXES is changed from 45 to 100, but the coordinates are not exactly the same as that in the example.jsonl. So I am a little confused. I don't know what is wrong during using the BriVL-BUA-applications. Can you give a example from the example.jsonl and generate bboxes of it by using BriVL-BUA-applications. Thank you very much!

Due to the difference between the library versions or the machines, the results of the bounding box will be slightly random, which will not affect the performance of BriVL. In addition, you can calculate the IoU values of these two sets of bounding boxes to verify their correctness.

Sep 03 '21 03:09 chuhaojin

We just fixed a bug: Change the image size in cfg/test.yml to 380. Please pay attention to this when using BriVL, sorry for the inconvenience.

Sep 03 '21 08:09 chuhaojin

BriVL uses the Bottom-Up Attention model as its object detection tool, this model can be obtained fromBriVL-BUA-applications

hi, I used BriVL-BUA-applications to get the bboxes. I modified the extract-bua-caffe-r101.yaml file. The MAX_BOXES is changed from 45 to 100, but the coordinates are not exactly the same as that in the example.jsonl. So I am a little confused. I don't know what is wrong during using the BriVL-BUA-applications. Can you give a example from the example.jsonl and generate bboxes of it by using BriVL-BUA-applications. Thank you very much!

I can reproduce the bboxes same as those in example.jsonl

Sep 05 '21 00:09 troilus-canva

BriVL uses the Bottom-Up Attention model as its object detection tool, this model can be obtained fromBriVL-BUA-applications

hi, I used BriVL-BUA-applications to get the bboxes. I modified the extract-bua-caffe-r101.yaml file. The MAX_BOXES is changed from 45 to 100, but the coordinates are not exactly the same as that in the example.jsonl. So I am a little confused. I don't know what is wrong during using the BriVL-BUA-applications. Can you give a example from the example.jsonl and generate bboxes of it by using BriVL-BUA-applications. Thank you very much!

I can reproduce the bboxes same as those in example.jsonl

hello, how can you do that ? Can you tell me what is changed in the extract-bua-caffe-r101.yaml file ? thank you!!!

Sep 06 '21 13:09 zgj-gutou

BriVL uses the Bottom-Up Attention model as its object detection tool, this model can be obtained fromBriVL-BUA-applications

hi, I used BriVL-BUA-applications to get the bboxes. I modified the extract-bua-caffe-r101.yaml file. The MAX_BOXES is changed from 45 to 100, but the coordinates are not exactly the same as that in the example.jsonl. So I am a little confused. I don't know what is wrong during using the BriVL-BUA-applications. Can you give a example from the example.jsonl and generate bboxes of it by using BriVL-BUA-applications. Thank you very much!

I can reproduce the bboxes same as those in example.jsonl

hello, how can you do that ? Can you tell me what is changed in the extract-bua-caffe-r101.yaml file ? thank you!!!

I didn't change anything except device from cuda to cpu, as I'm running it on mac. And run the command mentioned in the readme python3 bbox_extractor.py --img_path ../BriVL/data/imgs/baike_14014334_0.jpg --out_path test_data/test1.npz

Sep 07 '21 01:09 troilus-canva

By the way, I have test the AIC-ICC Validset from BriVL-API1.0, but the retrieval result is too low (Recall@1 < 1%). I use your released code for 'retrieval ' and Faiss vector retrieval, but still have disappointing result. Could you release more details in this Exp of Paper?

Hi, I got a similar results as yours on the AIC-ICC validation set (30000 images with 5 captions for each image): i2t R1: 1.57%, t2i R1:0.48% After going into details, I found the model did provide some resonable results, e.g. Screenshot_from_2021-10-19_17-42-47

The highlighted text in the bottom-left is the query text, the ground truth image is above the text. The three images on the right are the top-3 images matched by the model. However, as the example shows, the model only matches words "裙子", "女孩" and ignores other information, which severely affect the recall.

Moreover, I found another paper (https://arxiv.org/abs/2109.04699v2) that did the same evaluation on AIC-ICC dataset. In their paper, they mentioned that they conducted experiments on the "test subset" of AIC-ICC, which only contains 10000 "data". The results reported in their paper about the WenLan model are similar as those reported in WenLan paper. But the validation set contains 30000 images and 150000 captions instead. E-CLIP_dataset_detail

May the authors @chuhaojin provide more details of the test set and potential pre-processing procedures? Many thanks!

Oct 19 '21 11:10 Qiulin-W

@Qiulin-W @knaffe @chuhaojin The following results are tested on AIC-ICC validate dataset using the code of this repo. I can ensure that the processing results of jsonl file are exactly the same as the file provided in the example.

This result is far inferior to the result in the paper. Any suggestions?

@huang-xx @knaffe Sorry, I don’t know more about the evaluation details of the BriVL model. You can consult the student in the Model Development Group(@moonlitt, who takes charge of this part) for more details.

Nov 18 '21 12:11 chuhaojin

@moonlitt Hello, my evaluation results (i2t r@1 1.09% ; t2i r@1 0.37% ) on AIC-ICC validate dataset(I used 30000 samples) are also far from the results in paper. Could you please share the evaluation codes as a reference?

Nov 29 '21 10:11 jim4399266

BriVL BriVL copied to clipboard

new image bbox

BriVL
BriVL copied to clipboard