transformers
transformers copied to clipboard
Follow ups to DocumentQuestionAnswering Pipeline
Feature request
PR https://github.com/huggingface/transformers/pull/18414 has a number of TODOs left over which we'd like to track as follow up tasks.
Pipeline
- [x] Add support for documents which have more than the tokenizer span (e.g. 512) words
- [ ] Add support for multi-page documents (e.g. for Donut, we need to present one image per page)
- [x] Rework use of tokenizer to avoid the need for
add_prefix_space=True
- [x] Re-add support for Donut
- [ ] Refactor Donut usage in the pipeline or move logic into the tokenizer, so that pipeline does not have as much Donut-specific code
Testing
- [ ] Enable
test_small_model_pt_donut
oncehf-internal-testing/tiny-random-donut
is implemented
Documentation / Website
- [x] Add DocumentQuestionAnswering demo to Hosted Inference API so that model demos work
- [ ] Add tutorial documentation to Task Summary
Motivation
These are follow ups that we cut from the initial scope of PR #18414.
Your contribution
Happy to contribute many or all of these.
cc'ing @Narsil for enabling the model on the inference API, cc'ing @stevhliu for adding tutorial documentation to the task summary
@NielsRogge because we removed donut-swin
from AutoModelForDocumentQuestionAnswering
, you can no longer create a pipeline with donut, i.e.
In [2]: p = pipeline('document-question-answering', model='naver-clova-ix/donut-base-finetuned-docvqa')
/Users/ankur/projects/transformers/venv/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
The model 'VisionEncoderDecoderModel' is not supported for document-question-answering. Supported models are ['LayoutLMForQuestionAnswering', 'LayoutLMv2ForQuestionAnswering', 'LayoutLMv3ForQuestionAnswering'].
Should we add it back to that list? Or what is the best way to support that?
Could we re-open this (I don't think I have permissions to)? There are still a few changes necessary to complete all of the checkboxes.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@ankrgyl Can I ask you if I can work on this? If I want to work on adding support for multi-page documents (e.g. for Donut, we need to present one image per page), may I ask you where I can start to proceed making contributions?
Absolutely!
Feel free to start looking here: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/document_question_answering.py
- Add support for multi-page documents (e.g. for Donut, we need to present one image per page)
Thank you! I carefully read it! In order to add support for multi-page documents in document_question_answering.py
, should I modify some methods in that file such as preprocess()
? Can I create a pull request of the file you provided after modifying those methods?
@ankrgyl Hello. I would love to contribute to this task : Add tutorial documentation to Task Summary. Is it open and may I get pointers on how to begin working on it? Thank you.
@elabongaatuo It seems like the Add tutorial documentation to Task Summary is still open. are you working on it? It seems you need to change starting from here
Hello @y3sar , no, I am not working on it at the moment.
@elabongaatuo then I would like to take it up if there is no problem with you
Hello @y3sar , no, I am not working on it at the moment.
@elabongaatuo then I would like to take it up if there is no problem with you
Hello @y3sar , no, I am not working on it at the moment.
@y3sar , sure thing. 😊 no problem.
@ankrgyl I would Like to work on this Add tutorial documentation to Task Summary and also in Add support for multi-page documents (e.g. for Donut, we need to present one image per page)
@ankrgyl Can i work on Refactor Donut usage ???
Hey @ankrgyl ! I would be happy to contribute to this issue by adding support for multi-page documents. Could you assign this to me ?
Hey! For anyone wanting to contribute, the best way is to just open a PR and link it here! We don't usually assign issues as they can be taken over in case of inactivity for example! 🤗