Add Donut

Open NielsRogge opened this issue 2 years ago • 3 comments

What does this PR do?

This PR adds Donut to the library. Donut is to LayoutLM what T5 is to BERT. :D

The model is implemented as an instance of our existing VisionEncoderDecoderModel.

See also https://github.com/clovaai/donut/issues/10#issue-1324734927

Aug 05 '22 13:08 NielsRogge

The documentation is not available anymore as the PR was closed or merged.

Aug 05 '22 13:08 HuggingFaceDocBuilderDev

I have implemented a new DonutSwinModel, that copies everything of SwinModel, except the final layer norm. I've added it in a file called modeling_donut_swin.py (and implemented a corresponding DonutSwinConfig in configuration_donut_swin.py).

I went with modeling_donut_swin.py (and configuration_donut_swin.py) in the "donut" folder rather than modeling_donut.py (and configuration_donut.py) since it only implements the model and configuration of the encoder part (Swin Transformer). For the decoder, BART is leveraged. Let me know if this is ok.

Aug 08 '22 10:08 NielsRogge

Hi @NielsRogge , do you plan on supporting: Document Parsing modality?

Aug 09 '22 14:08 WaterKnight1998

transformers transformers copied to clipboard

Add Donut

What does this PR do?

transformers
transformers copied to clipboard