transformers
transformers copied to clipboard
[WIP] Add UDOP models
#20650
The documentation is not available anymore as the PR was closed or merged.
@sgugger @NielsRogge The model weights are here https://huggingface.co/ZinengTang/Udop/tree/main , But how to get the config for these models ?
@raghavanone For reference, someone asked the same question on the UDOP repo: https://github.com/microsoft/i-Code/issues/17
Note: Cannot proceed further without microsoft releasing the entire weights. Currently vision decoder weights have not been released.
If I'm not mistaken, vision decoder weights should not be needed when using the text layout decoder part, only.
vision_encoder weights are part of the shared model weights.
@raghavanone is there anything else blocking? It sounds like we can proceed with the given weights, assuming that we notify users that the vision decoder is not trained.
@logan-markewich Yes, I will work on closing this within couple of days .
@sgugger Need some pointers on How should this model be tested ? Can I follow the tests used for T5 model and replicate similar tests ?
@NielsRogge Any pointer here ?
I hope it gets merged soon @raghavanone . Nice work :)
Forgive my naiveté, why do all the tests call from_pretrained() on some variation of t5? The UDOP model checkpoints are here. Could these be used?
Ah, I see that the test script they provide also uses T5-large, I expected it to use one of those checkpoints
@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄
@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄
@thefirebanks I am working on fixing last few tests. Hoping to close this PR very soon. Sorry for the delay.
@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:
File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
encoder_outputs = self.encoder(
TypeError: forward() got an unexpected keyword argument 'ids_keep'`
I explained what appears to be happening in this comment.
It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration
EDIT
Looks like output_attentions, also needs to be removed
And in the self.decoder() call, cross_attn_head_mask, output_attentions
Happy to make the changes myself with repo permissions
@raghavanone I am currently trying to finetune
UdopUniModelForConditionalGenerationusing this PR. I ran into the following exception while training:File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward encoder_outputs = self.encoder( TypeError: forward() got an unexpected keyword argument 'ids_keep'`I explained what appears to be happening in this comment.
It looks like the
ids_keepparameter was removed fromUdopUniStackbut not removed from the call to it inUdopUniModelForConditionalGenerationEDIT Looks like
output_attentions, also needs to be removed And in theself.decoder()call,cross_attn_head_mask,output_attentionsHappy to make the changes myself with repo permissions
@plamb-viso Yes, removing those parameters were not done in all places, I have fixed it locally. I am working on fixing failing tests. This the last step pending for merging. Fixing these tests are taking more time than expected.
@raghavanone I saw you closed this PR. Skimming over your work, the PR seemed to be in a rather good state. Where there any blockers you encountered? IMO, it would be nice to add UDOP models in Hugginface at some point.
@maxjeblick @NielsRogge feels that the code original repo is bit hacky, he is working a separate PR to UDOP in better implementation, so closed this in consultation with him. He should open a PR soon .
@NielsRogge please do add more details for the benefit of folks following this PR
Thanks a lot for the fast reply!
@NielsRogge @raghavanone please link the new PR when its available for people subscribed to this one
Hi yes I'll open a PR soon! Thanks a lot for your work already @raghavanone, will ping you on the PR
Hi @NielsRogge I saw the large amount of commits on your new UDOP branch, curious if you have any idea on when you think a PR might be ready
Sorry to keep hammering on this, but again have noticed a flurry of activity on that branch then almost 2 weeks off. Curious what the plan is for it @NielsRogge
Hi @plamb-viso sorry for the late reply, the model is working, only have limited time to work on it. I'll open a PR this weekend/Monday.
For now you can already use the model if you're curious, check this code example regarding usage. Model is already on the hub here.
Out of curiosity @NielsRogge : did you ever use your implementation to fine tune it on a task like CORD?
I've fine-tuned the model on a toy dataset of RVL-CDIP, works well but the model is pretty heavy, got OOM on Google Colab even with batch size = 1 so had to use a bigger GPU. The author only released large variants.
In my original work on @raghavanone 's version of the model, I also had to use a batch size of 1 to get it to not OOM on 40gb GPUs