transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[WIP] Add UDOP models

Open raghavanone opened this issue 2 years ago • 16 comments
trafficstars

#20650

raghavanone avatar Jan 22 '23 05:01 raghavanone

The documentation is not available anymore as the PR was closed or merged.

@sgugger @NielsRogge The model weights are here https://huggingface.co/ZinengTang/Udop/tree/main , But how to get the config for these models ?

raghavanone avatar Jan 26 '23 05:01 raghavanone

@raghavanone For reference, someone asked the same question on the UDOP repo: https://github.com/microsoft/i-Code/issues/17

logan-markewich avatar Jan 26 '23 15:01 logan-markewich

Note: Cannot proceed further without microsoft releasing the entire weights. Currently vision decoder weights have not been released.

raghavanone avatar Jan 27 '23 13:01 raghavanone

If I'm not mistaken, vision decoder weights should not be needed when using the text layout decoder part, only. vision_encoder weights are part of the shared model weights.

maxjeblick avatar Jan 27 '23 15:01 maxjeblick

@raghavanone is there anything else blocking? It sounds like we can proceed with the given weights, assuming that we notify users that the vision decoder is not trained.

logan-markewich avatar Feb 03 '23 15:02 logan-markewich

@logan-markewich Yes, I will work on closing this within couple of days .

raghavanone avatar Feb 04 '23 07:02 raghavanone

@sgugger Need some pointers on How should this model be tested ? Can I follow the tests used for T5 model and replicate similar tests ?

raghavanone avatar Feb 04 '23 15:02 raghavanone

@NielsRogge Any pointer here ?

raghavanone avatar Feb 07 '23 03:02 raghavanone

I hope it gets merged soon @raghavanone . Nice work :)

WaterKnight1998 avatar Feb 09 '23 12:02 WaterKnight1998

Forgive my naiveté, why do all the tests call from_pretrained() on some variation of t5? The UDOP model checkpoints are here. Could these be used?

plamb-viso avatar Feb 22 '23 19:02 plamb-viso

Ah, I see that the test script they provide also uses T5-large, I expected it to use one of those checkpoints

plamb-viso avatar Feb 22 '23 19:02 plamb-viso

@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄

thefirebanks avatar Mar 04 '23 16:03 thefirebanks

@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄

@thefirebanks I am working on fixing last few tests. Hoping to close this PR very soon. Sorry for the delay.

raghavanone avatar Mar 06 '23 08:03 raghavanone

@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:

 File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
 encoder_outputs = self.encoder(
 TypeError: forward() got an unexpected keyword argument 'ids_keep'`

I explained what appears to be happening in this comment.

It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration

EDIT Looks like output_attentions, also needs to be removed And in the self.decoder() call, cross_attn_head_mask, output_attentions

Happy to make the changes myself with repo permissions

plamb-viso avatar Mar 06 '23 22:03 plamb-viso

@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:

 File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
 encoder_outputs = self.encoder(
 TypeError: forward() got an unexpected keyword argument 'ids_keep'`

I explained what appears to be happening in this comment.

It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration

EDIT Looks like output_attentions, also needs to be removed And in the self.decoder() call, cross_attn_head_mask, output_attentions

Happy to make the changes myself with repo permissions

@plamb-viso Yes, removing those parameters were not done in all places, I have fixed it locally. I am working on fixing failing tests. This the last step pending for merging. Fixing these tests are taking more time than expected.

raghavanone avatar Mar 07 '23 07:03 raghavanone

@raghavanone I saw you closed this PR. Skimming over your work, the PR seemed to be in a rather good state. Where there any blockers you encountered? IMO, it would be nice to add UDOP models in Hugginface at some point.

maxjeblick avatar Mar 13 '23 15:03 maxjeblick

@maxjeblick @NielsRogge feels that the code original repo is bit hacky, he is working a separate PR to UDOP in better implementation, so closed this in consultation with him. He should open a PR soon .

@NielsRogge please do add more details for the benefit of folks following this PR

raghavanone avatar Mar 13 '23 15:03 raghavanone

Thanks a lot for the fast reply!

maxjeblick avatar Mar 13 '23 15:03 maxjeblick

@NielsRogge @raghavanone please link the new PR when its available for people subscribed to this one

plamb-viso avatar Mar 13 '23 16:03 plamb-viso

Hi yes I'll open a PR soon! Thanks a lot for your work already @raghavanone, will ping you on the PR

NielsRogge avatar Mar 13 '23 16:03 NielsRogge

Hi @NielsRogge I saw the large amount of commits on your new UDOP branch, curious if you have any idea on when you think a PR might be ready

plamb-viso avatar Mar 24 '23 16:03 plamb-viso

Sorry to keep hammering on this, but again have noticed a flurry of activity on that branch then almost 2 weeks off. Curious what the plan is for it @NielsRogge

plamb-viso avatar Apr 19 '23 14:04 plamb-viso

Hi @plamb-viso sorry for the late reply, the model is working, only have limited time to work on it. I'll open a PR this weekend/Monday.

For now you can already use the model if you're curious, check this code example regarding usage. Model is already on the hub here.

NielsRogge avatar Apr 21 '23 20:04 NielsRogge

Out of curiosity @NielsRogge : did you ever use your implementation to fine tune it on a task like CORD?

dtiarks avatar Apr 22 '23 12:04 dtiarks

I've fine-tuned the model on a toy dataset of RVL-CDIP, works well but the model is pretty heavy, got OOM on Google Colab even with batch size = 1 so had to use a bigger GPU. The author only released large variants.

NielsRogge avatar Apr 22 '23 15:04 NielsRogge

In my original work on @raghavanone 's version of the model, I also had to use a batch size of 1 to get it to not OOM on 40gb GPUs

plamb-viso avatar Apr 22 '23 18:04 plamb-viso