transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Donut

Open NielsRogge opened this issue 2 years ago • 3 comments

What does this PR do?

This PR adds Donut to the library. Donut is to LayoutLM what T5 is to BERT. :D

The model is implemented as an instance of our existing VisionEncoderDecoderModel.

See also https://github.com/clovaai/donut/issues/10#issue-1324734927

NielsRogge avatar Aug 05 '22 13:08 NielsRogge

The documentation is not available anymore as the PR was closed or merged.

I have implemented a new DonutSwinModel, that copies everything of SwinModel, except the final layer norm. I've added it in a file called modeling_donut_swin.py (and implemented a corresponding DonutSwinConfig in configuration_donut_swin.py).

I went with modeling_donut_swin.py (and configuration_donut_swin.py) in the "donut" folder rather than modeling_donut.py (and configuration_donut.py) since it only implements the model and configuration of the encoder part (Swin Transformer). For the decoder, BART is leveraged. Let me know if this is ok.

NielsRogge avatar Aug 08 '22 10:08 NielsRogge

Hi @NielsRogge , do you plan on supporting: Document Parsing modality?

WaterKnight1998 avatar Aug 09 '22 14:08 WaterKnight1998