transformers document-question-answering pipeline does not work with some models

System Info

Colab, latest release

Who can help?

@NielsRogge

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

!apt install tesseract-ocr
!apt install libtesseract-dev
!pip install Pillow
!pip install pytesseract

# You can use a http link, a local path or a PIL.Image object
img_path = "https://huggingface.co/spaces/impira/docquery/resolve/main/invoice.png"

from transformers import pipeline
# This works
pipe = pipeline("document-question-answering", model="impira/layoutlm-document-qa")

# This breaks with strange error
pipe = pipeline("document-question-answering", model="impira/layoutlm-invoices")
# Error: KeyError: 'layoutlm-tc'

Expected behavior

This would work with both models

Sep 20 '22 21:09 osanseviero

The model_type in the config.json of this specific model seems to be wrong. The types currently supported that would work with LayoutLM are:

layoutlm
layoutlmv2
layoutlmv3
layoutxlm

The specified type is layoutlm-tc.

Sep 20 '22 22:09 LysandreJik

Cc @ankrgyl

Sep 21 '22 06:09 NielsRogge

From the transformers side, I think the error could be a bit more descriptive/informative than having a KeyError.

Sep 21 '22 10:09 osanseviero

I had a bit of discussion with @NielsRogge about this. The model type here is different because this model actually has a slightly different architecture than standard LayoutLM (it has an additional token classifier head). @NielsRogge was kind enough to submit a PR (https://huggingface.co/impira/layoutlm-invoices/discussions/1) which changes it to layoutlm.

With this change (now merged), your code above should run just fine. However, you will likely get suboptimal results, because the model has learned to depend on the token classifier to produce accurate results. I'd recommend running it through DocQuery (https://github.com/impira/docquery) which has a patched version of the model (here) that makes use of it.

You can do that via something like:

!apt install tesseract-ocr
!apt install libtesseract-dev
!pip install Pillow
!pip install pytesseract
!pip install docquery

# You can use a http link, a local path or a PIL.Image object
img_path = "https://huggingface.co/spaces/impira/docquery/resolve/main/invoice.png"

# This is a patched version of the pipeline that knows how to use the token classifier
from docquery import pipeline

# This works
pipe = pipeline("document-question-answering", model="impira/layoutlm-document-qa")

# This should work
pipe = pipeline("document-question-answering", model="impira/layoutlm-invoices")

In the meantime, I'll explore a few alternatives, e.g. packaging up the model directly in the repo or patching it a different way, so that it uses the token classifier.

Sep 21 '22 15:09 ankrgyl

@NielsRogge and @osanseviero just following up on this, we made the necessary changes in https://github.com/impira/docquery to keep the model working both in transformers directly and DocQuery, so at least from our side, we could close this issue.

Sep 29 '22 18:09 ankrgyl

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 24 '22 15:10 github-actions[bot]

@osanseviero I believe this issue should be closable now (your original repro should now succeed). But please let me know if you see otherwise.

Oct 24 '22 17:10 ankrgyl

Sounds good! Thanks a lot for this!

Oct 24 '22 17:10 osanseviero

transformers transformers copied to clipboard

document-question-answering pipeline does not work with some models

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard