mm-cot T5ForMultimodalGeneration Inference

I was trying to use the model for inference, but it's currently not supported yet, right?

Maybe my thinking is too complicated here, but the way I see it is that one would have to change the model.generate() method to work with T5ForMultimodalGeneration because of the additional input argument (image_ids). At least that's what I tried to do, but I didn't succeed yet and thought it would be better to ask before spending more time on debugging.

Cheers

Feb 22 '23 23:02 xaiguy

bump

Feb 24 '23 01:02 kshabahang

What do you mean "it's currently not supported yet"? I've managed to use it for inference (but only for the rationale, at the moment)

Mar 01 '23 08:03 gianfrancodemarco

@gianfrancodemarco On custom data? As far as I'm aware, the inference scripts only support inference on ground truth data (=evaluation).

For "real" inference, T5.generate() is needed which currently only supports text inputs. But as I said, I might be wrong.

Mar 01 '23 08:03 xaiguy

I ended up generating manually myself. Here is one with 30 iterations:

`cue = 'Question: Which figure of speech is used in this text? Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. —Homer, The Iliad Context: N/A Options: (A) chiasmus (B) apostrophe Solution:' I = tokenizer.get_vocab()

input_dict = tokenizer.encode_plus(cue, padding='max_length', return_attention_mask=True, return_tensors='pt') input_ids = input_dict['input_ids']#.squeeze() attention_mask = input_dict['attention_mask']#.squeeze() image_ids = test_set_unpacked[0]['image_ids'].to(torch.float32).unsqueeze(0) labels = 0*input_ids

predicted = [] probe = {'input_ids':input_ids, 'attention_mask':attention_mask, 'image_ids':image_ids, 'labels':input_ids}

for i in range(30): res = model(**probe) report = int(torch.argmax(res[1][0,len(predicted)])) probe['labels'][0, len(predicted)] = report predicted.append(report) print(tokenizer.decode(report))`

Note that it's deterministic but you can easily modify that...

Mar 01 '23 18:03 kshabahang

@xaiguy my previous comment wasn't referred to inferring on custom data, but was about reproducing their experiment. However, we managed to make it support also inference on custom data. This was needed firstly because i don't think using Seq2SeqTrainer to generate text is a good practice, and secondly because it would need labels (which in my case I don't have).

I ended up overriding the _prepare_encoder_decoder_kwargs_for_generation method so that it ignores the image_ids input, and the prepare_inputs_for_generation to make it also return the image ids.

I don't know if this is the best way to do this, but now it is also working with .generate()

Mar 08 '23 15:03 gianfrancodemarco

@gianfrancodemarco Thanks, that sounds a lot simpler than what I was trying to do! Were you able to confirm that it's working as intended? For example by comparing results with and without image data.

Mar 08 '23 16:03 xaiguy

Same issue that I met. Is there any practical solution to this problem? @xaiguy

Apr 04 '23 08:04 WeixuanXiong

@xaiguy i don't think i've conducted that exact experiment, but you can test it if you want!

Apr 04 '23 14:04 gianfrancodemarco

@xaiguy my previous comment wasn't referred to inferring on custom data, but was about reproducing their experiment. However, we managed to make it support also inference on custom data. This was needed firstly because i don't think using Seq2SeqTrainer to generate text is a good practice, and secondly because it would need labels (which in my case I don't have).

I ended up overriding the _prepare_encoder_decoder_kwargs_for_generation method so that it ignores the image_ids input, and the prepare_inputs_for_generation to make it also return the image ids.

I don't know if this is the best way to do this, but now it is also working with .generate()

Can you please share your piece of code for this? thank you

May 02 '23 19:05 sasaadi

@sasaadi You can find it here https://github.com/gianfrancodemarco/mm-cot/blob/9e84c2ed2ef6921a56f28911a938b78453496655/src/models/t5_multimodal_generation/model.py#L202

May 02 '23 19:05 gianfrancodemarco

mm-cot mm-cot copied to clipboard

T5ForMultimodalGeneration Inference

mm-cot
mm-cot copied to clipboard