YehLi

Results 9 comments of YehLi

The kernel sizes of conv2d and cotlayer are both 3*3. You can refer to Table 6 for the comparison. The parameters and flops are calculated by utils/flops_counter.py (get_model_complexity_info).

Thanks for your reply. How many gpus do you utilize for training on scannetv2?

You can refer to the openai github (https://github.com/openai/CLIP) for more details.

The semantics label file is uploaded to configs/image_caption/cosnet/semantics labels.txt

You can save the attention score into a txt file and then use the link (https://github.com/peteanderson80/Up-Down-Captioner/blob/master/scripts/demo.ipynb) to generate the visualization

The senet154 features have been released(Data preparation -- The pretrained SENet-154 model can be downloaded [here](https://drive.google.com/file/d/1CrWJcdKLPmFYVdVNcQLviwKGtAREjarR/view?usp=sharing)) You can refer to the code for ensemble (https://github.com/YehLi/xmodaler/blob/master/xmodaler/modeling/decode_strategy/ensemble_beam_searcher.py)

1. Please try to change "selected_beam = selected_idx / candidate_logprob.shape[-1]" to "selected_beam = selected_idx // candidate_logprob.shape[-1]" for new pytorch version. The code can work correctly in my server after this...

The slot is used to predict the missing semantics. The training sentences is from COCO training set. The preprocess code will be uploaded later.