Pengchuan Zhang
Pengchuan Zhang
Hi @FingerRec and @coldmanck , please change the batch size to 1024, which is the batch size reported in the paper. Sorry that the batch size 8 here is just...
@coldmanck training resources: 16 V100 (32G), 1024 total batchsize, train for 2M iterations takes about 20 weeks. We do not use AMP, and the 32G memory is sufficient. If you...
@309018451 For the image tags, the image tags used in both NoCaps training and testing are generated from an OD model pretrained on OpenImages dataset, not from the model vinvl_vg_x152c4.pth....
@lucidrains I recently used your implementation of performer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/performer.py) of linformer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/linformer.py) to compare different efficient attention mechanisms in image classification and object detection tasks. See the results reported here:...
@lucidrains I recently used your implementation of performer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/performer.py) of linformer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/linformer.py) to compare different efficient attention mechanisms in image classification and object detection tasks. See the results reported here:...
The checkpoint download link was updated: # pretrained models at https://drive.google.com/file/d/1nvu8y4zZFbJqSqLdQClvMyzsbBeX-oaQ/view?usp=sharing # the associated labelmap at https://drive.google.com/file/d/1M1nPtMPHS1GXx5HvKMMDxOksxQgwX14_/view?usp=sharing python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR...
@ckzbullbullet Did you solve the problem?
It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that...
Yes, @CCYChongyanChen "an" and "s" are answers and scores, not used in the model but used in the evaluation. The order does not matter, but it matters to only keep...
Please refer to the repo https://github.com/microsoft/scene_graph_benchmark for feature extraction. If you only find minor difference in the predictions (by looking at the visualization tools/demo/demo_image.py), then the extracted features should be...