TCL
TCL copied to clipboard
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
Is there an easy way to inference the model on some new examples? Also, are there any plans to put the model on Hugging face?
Hi thanks for this wonderful work. I am confused about the CrossAttention Module, In the code of XBERT,when layer_num>=6, the text_encoder will turn into cross attention, however it will do...
Hi, > # 10% of the time, we replace masked input tokens with random word indices_random = torch.bernoulli(torch.full(input_ids.shape, 0.5)).bool() & masked_indices & ~indices_replaced Here is the code you use to...
Really appreciate for the code. I have a quesitons: How do you re-generate the annontation file after using img2dataset since some urls are not avaiable. And we can't use the...
Hi Jinyu, Thanks for sharing the code of the great work TCL. I have some questions about the code of `model_vqa.py`. 1. [top k answers for each question](https://github.com/uta-smile/TCL/blob/74a3e4f963a77ba43f2a2e2abe02bbeea22eba09/models/model_vqa.py#L169-L171), shouldn't the...
The line 249 in [models](https://github.com/uta-smile/TCL/tree/main/models/model_retrieval.py) `u_p = (temp_mask * u_p) + (10000. * (1-temp_mask))` may should be `u_p = (temp_mask * u_p) - (10000. * (1-temp_mask))` ?