transformers
transformers copied to clipboard
document-question-answering pipeline does not work with some models
System Info
Colab, latest release
Who can help?
@NielsRogge
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
!apt install tesseract-ocr
!apt install libtesseract-dev
!pip install Pillow
!pip install pytesseract
# You can use a http link, a local path or a PIL.Image object
img_path = "https://huggingface.co/spaces/impira/docquery/resolve/main/invoice.png"
from transformers import pipeline
# This works
pipe = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
# This breaks with strange error
pipe = pipeline("document-question-answering", model="impira/layoutlm-invoices")
# Error: KeyError: 'layoutlm-tc'
Expected behavior
This would work with both models
The model_type
in the config.json of this specific model seems to be wrong. The types currently supported that would work with LayoutLM are:
-
layoutlm
-
layoutlmv2
-
layoutlmv3
-
layoutxlm
The specified type is layoutlm-tc
.
Cc @ankrgyl
From the transformers
side, I think the error could be a bit more descriptive/informative than having a KeyError
.
I had a bit of discussion with @NielsRogge about this. The model type here is different because this model actually has a slightly different architecture than standard LayoutLM (it has an additional token classifier head). @NielsRogge was kind enough to submit a PR (https://huggingface.co/impira/layoutlm-invoices/discussions/1) which changes it to layoutlm
.
With this change (now merged), your code above should run just fine. However, you will likely get suboptimal results, because the model has learned to depend on the token classifier to produce accurate results. I'd recommend running it through DocQuery (https://github.com/impira/docquery) which has a patched version of the model (here) that makes use of it.
You can do that via something like:
!apt install tesseract-ocr
!apt install libtesseract-dev
!pip install Pillow
!pip install pytesseract
!pip install docquery
# You can use a http link, a local path or a PIL.Image object
img_path = "https://huggingface.co/spaces/impira/docquery/resolve/main/invoice.png"
# This is a patched version of the pipeline that knows how to use the token classifier
from docquery import pipeline
# This works
pipe = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
# This should work
pipe = pipeline("document-question-answering", model="impira/layoutlm-invoices")
In the meantime, I'll explore a few alternatives, e.g. packaging up the model directly in the repo or patching it a different way, so that it uses the token classifier.
@NielsRogge and @osanseviero just following up on this, we made the necessary changes in https://github.com/impira/docquery to keep the model working both in transformers directly and DocQuery, so at least from our side, we could close this issue.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@osanseviero I believe this issue should be closable now (your original repro should now succeed). But please let me know if you see otherwise.
Sounds good! Thanks a lot for this!