Pengchuan Zhang comments

Results 14 comments of


                                            Pengchuan Zhang

How long did you pertained the OSCAR/OSCAR+

Hi @FingerRec and @coldmanck , please change the batch size to 1024, which is the batch size reported in the paper. Sorry that the batch size 8 here is just...

How long did you pertained the OSCAR/OSCAR+

@coldmanck training resources: 16 V100 (32G), 1024 total batchsize, train for 2M iterations takes about 20 weeks. We do not use AMP, and the 32G memory is sufficient. If you...

About od model to generate coco_caption features. I cannot reproduce your feature results.

@309018451 For the image tags, the image tags used in both NoCaps training and testing are generated from an OD model pretrained on OpenImages dataset, not from the model vinvl_vg_x152c4.pth....

Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?

@lucidrains I recently used your implementation of performer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/performer.py) of linformer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/linformer.py) to compare different efficient attention mechanisms in image classification and object detection tasks. See the results reported here:...

Performance gain replacing original attention to fast attention in this repo?

Pretrained model URL doesn't work.

The checkpoint download link was updated: # pretrained models at https://drive.google.com/file/d/1nvu8y4zZFbJqSqLdQClvMyzsbBeX-oaQ/view?usp=sharing # the associated labelmap at https://drive.google.com/file/d/1M1nPtMPHS1GXx5HvKMMDxOksxQgwX14_/view?usp=sharing python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR...

Errors when downloading image features and pre-training corpus

@ckzbullbullet Did you solve the problem?

how are the object tags are generated?

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that...

how are the object tags are generated?

Yes, @CCYChongyanChen "an" and "s" are answers and scores, not used in the model but used in the evaluation. The order does not matter, but it matters to only keep...

Pre-exacted image features

Please refer to the repo https://github.com/microsoft/scene_graph_benchmark for feature extraction. If you only find minor difference in the predictions (by looking at the visualization tools/demo/demo_image.py), then the extracted features should be...