transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Follow ups to DocumentQuestionAnswering Pipeline

Open ankrgyl opened this issue 1 year ago • 7 comments

Feature request

PR https://github.com/huggingface/transformers/pull/18414 has a number of TODOs left over which we'd like to track as follow up tasks.

Pipeline

  • [x] Add support for documents which have more than the tokenizer span (e.g. 512) words
  • [ ] Add support for multi-page documents (e.g. for Donut, we need to present one image per page)
  • [x] Rework use of tokenizer to avoid the need for add_prefix_space=True
  • [x] Re-add support for Donut
  • [ ] Refactor Donut usage in the pipeline or move logic into the tokenizer, so that pipeline does not have as much Donut-specific code

Testing

  • [ ] Enable test_small_model_pt_donut once hf-internal-testing/tiny-random-donut is implemented

Documentation / Website

Motivation

These are follow ups that we cut from the initial scope of PR #18414.

Your contribution

Happy to contribute many or all of these.

ankrgyl avatar Sep 07 '22 16:09 ankrgyl

cc'ing @Narsil for enabling the model on the inference API, cc'ing @stevhliu for adding tutorial documentation to the task summary

NielsRogge avatar Sep 08 '22 09:09 NielsRogge

@NielsRogge because we removed donut-swin from AutoModelForDocumentQuestionAnswering, you can no longer create a pipeline with donut, i.e.

In [2]: p = pipeline('document-question-answering', model='naver-clova-ix/donut-base-finetuned-docvqa')
/Users/ankur/projects/transformers/venv/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
The model 'VisionEncoderDecoderModel' is not supported for document-question-answering. Supported models are ['LayoutLMForQuestionAnswering', 'LayoutLMv2ForQuestionAnswering', 'LayoutLMv3ForQuestionAnswering'].

Should we add it back to that list? Or what is the best way to support that?

ankrgyl avatar Sep 09 '22 00:09 ankrgyl

Could we re-open this (I don't think I have permissions to)? There are still a few changes necessary to complete all of the checkboxes.

ankrgyl avatar Sep 26 '22 14:09 ankrgyl

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 22 '22 15:10 github-actions[bot]

@ankrgyl Can I ask you if I can work on this? If I want to work on adding support for multi-page documents (e.g. for Donut, we need to present one image per page), may I ask you where I can start to proceed making contributions?

JuheonChu avatar Mar 23 '23 03:03 JuheonChu

Absolutely!

Feel free to start looking here: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/document_question_answering.py

ankrgyl avatar Mar 23 '23 04:03 ankrgyl

  • Add support for multi-page documents (e.g. for Donut, we need to present one image per page)

Thank you! I carefully read it! In order to add support for multi-page documents in document_question_answering.py, should I modify some methods in that file such as preprocess()? Can I create a pull request of the file you provided after modifying those methods?

JuheonChu avatar Mar 24 '23 02:03 JuheonChu

@ankrgyl Hello. I would love to contribute to this task : Add tutorial documentation to Task Summary. Is it open and may I get pointers on how to begin working on it? Thank you.

elabongaatuo avatar Apr 03 '23 17:04 elabongaatuo

@elabongaatuo It seems like the Add tutorial documentation to Task Summary is still open. are you working on it? It seems you need to change starting from here

y3sar avatar May 10 '23 05:05 y3sar

Hello @y3sar , no, I am not working on it at the moment.

elabongaatuo avatar May 10 '23 06:05 elabongaatuo

@elabongaatuo then I would like to take it up if there is no problem with you

Hello @y3sar , no, I am not working on it at the moment.

y3sar avatar May 10 '23 06:05 y3sar

@elabongaatuo then I would like to take it up if there is no problem with you

Hello @y3sar , no, I am not working on it at the moment.

@y3sar , sure thing. 😊 no problem.

elabongaatuo avatar May 10 '23 06:05 elabongaatuo

@ankrgyl I would Like to work on this Add tutorial documentation to Task Summary and also in Add support for multi-page documents (e.g. for Donut, we need to present one image per page)

rajveer43 avatar Jul 26 '23 07:07 rajveer43

@ankrgyl Can i work on Refactor Donut usage ???

hackpk avatar Aug 09 '23 17:08 hackpk

Hey @ankrgyl ! I would be happy to contribute to this issue by adding support for multi-page documents. Could you assign this to me ?

dhivyeshrk avatar Oct 23 '23 05:10 dhivyeshrk

Hey! For anyone wanting to contribute, the best way is to just open a PR and link it here! We don't usually assign issues as they can be taken over in case of inactivity for example! 🤗

ArthurZucker avatar Oct 23 '23 10:10 ArthurZucker