YehLi comments

Results 9 comments of


YehLi

Questions about CotLayer's parameters and computational complexity

The kernel sizes of conv2d and cotlayer are both 3*3. You can refer to Table 6 for the comparison. The parameters and flops are calculated by utils/flops_counter.py (get_model_complexity_info).

When will you publish codes about scannetv2?

Thanks for your reply. How many gpus do you utilize for training on scannetv2?

Image to text search using clip

You can refer to the openai github (https://github.com/openai/CLIP) for more details.

Image to text search using clip

The semantics label file is uploaded to configs/image_caption/cosnet/semantics labels.txt

How to generate the visualization of attended image regions along the caption generation processes.

You can save the attention score into a txt file and then use the link (https://github.com/peteanderson80/Up-Down-Captioner/blob/master/scripts/demo.ipynb) to generate the visualization

Add UniT: Multimodal Multitask Learning with a Unified Transformer

We will take it into consideration.

Ensemble code and how to use senet154

The senet154 features have been released(Data preparation -- The pretrained SENet-154 model can be downloaded [here](https://drive.google.com/file/d/1CrWJcdKLPmFYVdVNcQLviwKGtAREjarR/view?usp=sharing)) You can refer to the code for ensemble (https://github.com/YehLi/xmodaler/blob/master/xmodaler/modeling/decode_strategy/ensemble_beam_searcher.py)

Test results are all 0 when using the author's checkpoint

1. Please try to change "selected_beam = selected_idx / candidate_logprob.shape[-1]" to "selected_beam = selected_idx // candidate_logprob.shape[-1]" for new pytorch version. The code can work correctly in my server after this...

Some questions about your CVPR 2022 paper

The slot is used to predict the missing semantics. The training sentences is from COCO training set. The preprocess code will be uploaded later.