mm-cot icon indicating copy to clipboard operation
mm-cot copied to clipboard

T5ForMultimodalGeneration Inference

Open xaiguy opened this issue 2 years ago • 10 comments

I was trying to use the model for inference, but it's currently not supported yet, right?

Maybe my thinking is too complicated here, but the way I see it is that one would have to change the model.generate() method to work with T5ForMultimodalGeneration because of the additional input argument (image_ids). At least that's what I tried to do, but I didn't succeed yet and thought it would be better to ask before spending more time on debugging.

Cheers

xaiguy avatar Feb 22 '23 23:02 xaiguy

bump

kshabahang avatar Feb 24 '23 01:02 kshabahang

What do you mean "it's currently not supported yet"? I've managed to use it for inference (but only for the rationale, at the moment)

gianfrancodemarco avatar Mar 01 '23 08:03 gianfrancodemarco

@gianfrancodemarco On custom data? As far as I'm aware, the inference scripts only support inference on ground truth data (=evaluation).

For "real" inference, T5.generate() is needed which currently only supports text inputs. But as I said, I might be wrong.

xaiguy avatar Mar 01 '23 08:03 xaiguy

I ended up generating manually myself. Here is one with 30 iterations:

`cue = 'Question: Which figure of speech is used in this text? Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. —Homer, The Iliad Context: N/A Options: (A) chiasmus (B) apostrophe Solution:' I = tokenizer.get_vocab()

input_dict = tokenizer.encode_plus(cue, padding='max_length', return_attention_mask=True, return_tensors='pt') input_ids = input_dict['input_ids']#.squeeze() attention_mask = input_dict['attention_mask']#.squeeze() image_ids = test_set_unpacked[0]['image_ids'].to(torch.float32).unsqueeze(0) labels = 0*input_ids

predicted = [] probe = {'input_ids':input_ids, 'attention_mask':attention_mask, 'image_ids':image_ids, 'labels':input_ids}

for i in range(30): res = model(**probe) report = int(torch.argmax(res[1][0,len(predicted)])) probe['labels'][0, len(predicted)] = report predicted.append(report) print(tokenizer.decode(report))`

Note that it's deterministic but you can easily modify that...

kshabahang avatar Mar 01 '23 18:03 kshabahang

@xaiguy my previous comment wasn't referred to inferring on custom data, but was about reproducing their experiment. However, we managed to make it support also inference on custom data. This was needed firstly because i don't think using Seq2SeqTrainer to generate text is a good practice, and secondly because it would need labels (which in my case I don't have).

I ended up overriding the _prepare_encoder_decoder_kwargs_for_generation method so that it ignores the image_ids input, and the prepare_inputs_for_generation to make it also return the image ids.

I don't know if this is the best way to do this, but now it is also working with .generate()

gianfrancodemarco avatar Mar 08 '23 15:03 gianfrancodemarco

@gianfrancodemarco Thanks, that sounds a lot simpler than what I was trying to do! Were you able to confirm that it's working as intended? For example by comparing results with and without image data.

xaiguy avatar Mar 08 '23 16:03 xaiguy

Same issue that I met. Is there any practical solution to this problem? @xaiguy

WeixuanXiong avatar Apr 04 '23 08:04 WeixuanXiong

@xaiguy i don't think i've conducted that exact experiment, but you can test it if you want!

gianfrancodemarco avatar Apr 04 '23 14:04 gianfrancodemarco

@xaiguy my previous comment wasn't referred to inferring on custom data, but was about reproducing their experiment. However, we managed to make it support also inference on custom data. This was needed firstly because i don't think using Seq2SeqTrainer to generate text is a good practice, and secondly because it would need labels (which in my case I don't have).

I ended up overriding the _prepare_encoder_decoder_kwargs_for_generation method so that it ignores the image_ids input, and the prepare_inputs_for_generation to make it also return the image ids.

I don't know if this is the best way to do this, but now it is also working with .generate()

Can you please share your piece of code for this? thank you

sasaadi avatar May 02 '23 19:05 sasaadi

@sasaadi You can find it here https://github.com/gianfrancodemarco/mm-cot/blob/9e84c2ed2ef6921a56f28911a938b78453496655/src/models/t5_multimodal_generation/model.py#L202

gianfrancodemarco avatar May 02 '23 19:05 gianfrancodemarco