transformers
transformers copied to clipboard
Add Donut
What does this PR do?
This PR adds Donut to the library. Donut is to LayoutLM what T5 is to BERT. :D
The model is implemented as an instance of our existing VisionEncoderDecoderModel
.
See also https://github.com/clovaai/donut/issues/10#issue-1324734927
The documentation is not available anymore as the PR was closed or merged.
I have implemented a new DonutSwinModel
, that copies everything of SwinModel
, except the final layer norm. I've added it in a file called modeling_donut_swin.py
(and implemented a corresponding DonutSwinConfig
in configuration_donut_swin.py
).
I went with modeling_donut_swin.py
(and configuration_donut_swin.py
) in the "donut" folder rather than modeling_donut.py
(and configuration_donut.py
) since it only implements the model and configuration of the encoder part (Swin Transformer). For the decoder, BART is leveraged. Let me know if this is ok.
Hi @NielsRogge , do you plan on supporting: Document Parsing
modality?