mm-cot
mm-cot copied to clipboard
T5ForMultimodalGeneration Inference
I was trying to use the model for inference, but it's currently not supported yet, right?
Maybe my thinking is too complicated here, but the way I see it is that one would have to change the model.generate() method to work with T5ForMultimodalGeneration because of the additional input argument (image_ids). At least that's what I tried to do, but I didn't succeed yet and thought it would be better to ask before spending more time on debugging.
Cheers
bump
What do you mean "it's currently not supported yet"? I've managed to use it for inference (but only for the rationale, at the moment)
@gianfrancodemarco On custom data? As far as I'm aware, the inference scripts only support inference on ground truth data (=evaluation).
For "real" inference, T5.generate() is needed which currently only supports text inputs. But as I said, I might be wrong.
I ended up generating manually myself. Here is one with 30 iterations:
`cue = 'Question: Which figure of speech is used in this text? Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. —Homer, The Iliad Context: N/A Options: (A) chiasmus (B) apostrophe Solution:' I = tokenizer.get_vocab()
input_dict = tokenizer.encode_plus(cue, padding='max_length', return_attention_mask=True, return_tensors='pt') input_ids = input_dict['input_ids']#.squeeze() attention_mask = input_dict['attention_mask']#.squeeze() image_ids = test_set_unpacked[0]['image_ids'].to(torch.float32).unsqueeze(0) labels = 0*input_ids
predicted = [] probe = {'input_ids':input_ids, 'attention_mask':attention_mask, 'image_ids':image_ids, 'labels':input_ids}
for i in range(30): res = model(**probe) report = int(torch.argmax(res[1][0,len(predicted)])) probe['labels'][0, len(predicted)] = report predicted.append(report) print(tokenizer.decode(report))`
Note that it's deterministic but you can easily modify that...
@xaiguy my previous comment wasn't referred to inferring on custom data, but was about reproducing their experiment. However, we managed to make it support also inference on custom data. This was needed firstly because i don't think using Seq2SeqTrainer to generate text is a good practice, and secondly because it would need labels (which in my case I don't have).
I ended up overriding the _prepare_encoder_decoder_kwargs_for_generation method so that it ignores the image_ids input, and the prepare_inputs_for_generation to make it also return the image ids.
I don't know if this is the best way to do this, but now it is also working with .generate()
@gianfrancodemarco Thanks, that sounds a lot simpler than what I was trying to do! Were you able to confirm that it's working as intended? For example by comparing results with and without image data.
Same issue that I met. Is there any practical solution to this problem? @xaiguy
@xaiguy i don't think i've conducted that exact experiment, but you can test it if you want!
@xaiguy my previous comment wasn't referred to inferring on custom data, but was about reproducing their experiment. However, we managed to make it support also inference on custom data. This was needed firstly because i don't think using Seq2SeqTrainer to generate text is a good practice, and secondly because it would need labels (which in my case I don't have).
I ended up overriding the _prepare_encoder_decoder_kwargs_for_generation method so that it ignores the image_ids input, and the prepare_inputs_for_generation to make it also return the image ids.
I don't know if this is the best way to do this, but now it is also working with .generate()
Can you please share your piece of code for this? thank you
@sasaadi You can find it here https://github.com/gianfrancodemarco/mm-cot/blob/9e84c2ed2ef6921a56f28911a938b78453496655/src/models/t5_multimodal_generation/model.py#L202