datasets Add DocVQA

Add DocVQA

Open NielsRogge opened this issue 1 year ago • 1 comments

Name: DocVQA
Description: Document Visual Question Answering (DocVQA) seeks to inspire a “purpose-driven” point of view in Document Analysis and Recognition research, where the document content is extracted and used to respond to high-level tasks defined by the human consumers of this information.
Paper: https://arxiv.org/abs/2007.00398
Data: https://www.docvqa.org/datasets/docvqa
Motivation: Models like LayoutLM and Donut in the Transformers library are fine-tuned on DocVQA. Would be very handy to directly load this dataset from the hub.

Instructions to add a new dataset can be found here.

Aug 04 '22 13:08 NielsRogge

Thanks for proposing, @NielsRogge.

Please, note this dataset requires registering in their website and their Terms and Conditions state we cannot distribute their URL:

1. You will NOT distribute the download URLs
...

Aug 08 '22 05:08 albertvillanova