TCL Question about VQA fine-tuning

Question about VQA fine-tuning

Open czy-orange opened this issue 1 year ago • 0 comments

Hi Jinyu, Thanks for sharing the code of the great work TCL. I have some questions about the code of model_vqa.py. 1. top k answers for each question, shouldn't the code be answer_ids[b] and answer_atts[b]? 2. use of text decoder, based on targets_ids = input_ids.masked_fill(input_ids == self.tokenizer.pad_token_id, -100), the input_ids are almost the same as targets_ids except the pad token id, so what's the point of calculating loss and generating the answer for the second time?

Thanks!

Jun 07 '23 06:06 czy-orange

TCL TCL copied to clipboard

Question about VQA fine-tuning

TCL
TCL copied to clipboard