transformers Follow ups to DocumentQuestionAnswering Pipeline

Feature request

PR https://github.com/huggingface/transformers/pull/18414 has a number of TODOs left over which we'd like to track as follow up tasks.

Pipeline

[x] Add support for documents which have more than the tokenizer span (e.g. 512) words
[ ] Add support for multi-page documents (e.g. for Donut, we need to present one image per page)
[x] Rework use of tokenizer to avoid the need for add_prefix_space=True
[x] Re-add support for Donut
[ ] Refactor Donut usage in the pipeline or move logic into the tokenizer, so that pipeline does not have as much Donut-specific code

Testing

[ ] Enable test_small_model_pt_donut once hf-internal-testing/tiny-random-donut is implemented

Documentation / Website

[x] Add DocumentQuestionAnswering demo to Hosted Inference API so that model demos work
[ ] Add tutorial documentation to Task Summary

Motivation

These are follow ups that we cut from the initial scope of PR #18414.

Your contribution

Happy to contribute many or all of these.

Sep 07 '22 16:09 ankrgyl

cc'ing @Narsil for enabling the model on the inference API, cc'ing @stevhliu for adding tutorial documentation to the task summary

Sep 08 '22 09:09 NielsRogge

@NielsRogge because we removed donut-swin from AutoModelForDocumentQuestionAnswering, you can no longer create a pipeline with donut, i.e.

In [2]: p = pipeline('document-question-answering', model='naver-clova-ix/donut-base-finetuned-docvqa')
/Users/ankur/projects/transformers/venv/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
The model 'VisionEncoderDecoderModel' is not supported for document-question-answering. Supported models are ['LayoutLMForQuestionAnswering', 'LayoutLMv2ForQuestionAnswering', 'LayoutLMv3ForQuestionAnswering'].

Should we add it back to that list? Or what is the best way to support that?

Sep 09 '22 00:09 ankrgyl

Could we re-open this (I don't think I have permissions to)? There are still a few changes necessary to complete all of the checkboxes.

Sep 26 '22 14:09 ankrgyl

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 22 '22 15:10 github-actions[bot]

@ankrgyl Can I ask you if I can work on this? If I want to work on adding support for multi-page documents (e.g. for Donut, we need to present one image per page), may I ask you where I can start to proceed making contributions?

Mar 23 '23 03:03 JuheonChu

Absolutely!

Feel free to start looking here: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/document_question_answering.py

Mar 23 '23 04:03 ankrgyl

Add support for multi-page documents (e.g. for Donut, we need to present one image per page)

Thank you! I carefully read it! In order to add support for multi-page documents in document_question_answering.py, should I modify some methods in that file such as preprocess()? Can I create a pull request of the file you provided after modifying those methods?

Mar 24 '23 02:03 JuheonChu

@ankrgyl Hello. I would love to contribute to this task : Add tutorial documentation to Task Summary. Is it open and may I get pointers on how to begin working on it? Thank you.

Apr 03 '23 17:04 elabongaatuo

@elabongaatuo It seems like the Add tutorial documentation to Task Summary is still open. are you working on it? It seems you need to change starting from here

May 10 '23 05:05 y3sar

Hello @y3sar , no, I am not working on it at the moment.

May 10 '23 06:05 elabongaatuo

@elabongaatuo then I would like to take it up if there is no problem with you

Hello @y3sar , no, I am not working on it at the moment.

May 10 '23 06:05 y3sar

@elabongaatuo then I would like to take it up if there is no problem with you

Hello @y3sar , no, I am not working on it at the moment.

@y3sar , sure thing. 😊 no problem.

May 10 '23 06:05 elabongaatuo

@ankrgyl I would Like to work on this Add tutorial documentation to Task Summary and also in Add support for multi-page documents (e.g. for Donut, we need to present one image per page)

Jul 26 '23 07:07 rajveer43

@ankrgyl Can i work on Refactor Donut usage ???

Aug 09 '23 17:08 hackpk

Hey @ankrgyl ! I would be happy to contribute to this issue by adding support for multi-page documents. Could you assign this to me ?

Oct 23 '23 05:10 dhivyeshrk

Hey! For anyone wanting to contribute, the best way is to just open a PR and link it here! We don't usually assign issues as they can be taken over in case of inactivity for example! 🤗

Oct 23 '23 10:10 ArthurZucker