transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add LayoutLMForQuestionAnswering model

Open ankrgyl opened this issue 1 year ago • 8 comments

What does this PR do?

This PR adds a LayoutLMForQuestionAnswering class that follows the implementations of LayoutLMv2ForQuestionAnswering and LayoutLMv3ForQuestionAnswering, so that LayoutLM can be fine-tuned for the question answering task.

Fixes #18380

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case: https://github.com/huggingface/transformers/issues/18380
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@Narsil

ankrgyl avatar Aug 01 '22 18:08 ankrgyl

@Narsil I've left a few TODOs -- (1) supporting tensorflow, (2) filling in docs, (3) filling in tests -- which I'll gladly do. I just wanted to post sooner than later to start getting feedback on the approach.

ankrgyl avatar Aug 01 '22 18:08 ankrgyl

The documentation is not available anymore as the PR was closed or merged.

Ok, for this part I will let @NielsRogge comment as I am not the best person to answer how it should be done.

Narsil avatar Aug 02 '22 11:08 Narsil

@NielsRogge @Narsil gentle nudge on this PR. I plan to fix the tests + write docs as a next step but wanted to get some quick feedback about whether this approach is acceptable for including LayoutLMForQuestionAnswering. Appreciate your consideration!

ankrgyl avatar Aug 03 '22 06:08 ankrgyl

Thanks @NielsRogge!

We're discussing the pipeline part in pull request 18414. Would love your feedback there too!

ankrgyl avatar Aug 03 '22 14:08 ankrgyl

@NielsRogge @Narsil I just updated it to include tests+documentation. If it's okay, I'd like to defer the tensorflow implementation for now (due to some personal lack of familiarity). I am failing a consistency check, however, as a result:

  File "/Users/ankur/projects/transformers/transformers/utils/check_inits.py", line 298, in <module>
    check_all_inits()
  File "/Users/ankur/projects/transformers/transformers/utils/check_inits.py", line 238, in check_all_inits
    raise ValueError("\n\n".join(failures))
ValueError: Problem in src/transformers/models/layoutlm/__init__.py, both halves do not define the same objects.
Differences for tf backend:
  LayoutLMForQuestionAnswering in _import_structure but not in TYPE_HINT.

Could you help me resolve this?

ankrgyl avatar Aug 03 '22 16:08 ankrgyl

@NielsRogge @Narsil, I went ahead and implemented support for TensorFlow and the checks are now passing. Would appreciate a re-review.

ankrgyl avatar Aug 05 '22 02:08 ankrgyl

@NielsRogge gentle nudge on this PR :)

ankrgyl avatar Aug 09 '22 14:08 ankrgyl

Thanks @NielsRogge! I just updated with your comments, added to the list of doc tests, and verified locally that they are (now) passing.

ankrgyl avatar Aug 10 '22 14:08 ankrgyl

Up to you guys on that one!

ankrgyl avatar Aug 12 '22 13:08 ankrgyl

@NielsRogge @Narsil I did some thinking over the weekend and think it makes sense to include them in AutoModelForQuestionAnswering to be consistent with LayoutLMv2 and v3. We can move around the auto mapping in PR #18414.

Let me know if you have any concerns with that thinking. If not, I'll proceed with merging the change in.

ankrgyl avatar Aug 15 '22 16:08 ankrgyl

@Narsil @NielsRogge did you have any further questions on this PR, or is it ready to merge in?

ankrgyl avatar Aug 17 '22 15:08 ankrgyl

Also happy to hold off, since we have some traction with PR #18414, and just wait to include it in the AutoModelForDocumentQuestionAnswering there?

ankrgyl avatar Aug 22 '22 15:08 ankrgyl

Hi @Narsil @NielsRogge just wanted to bump on this -- based on the most recent round of comments on PR #18414, we removed LayoutLMv2ForQuestionAnswering and LayoutLMv3ForQuestionAnswering from AutoModelForQuestionAnswering, so I think it makes sense to not add LayoutLMForQuestionAnswering to the auto mapping, if we are about to remove it.

I will go ahead and remove it and update the PR. Please let me know if it's ready to move forward. It would be very helpful to rebase PR #18414 against it for testing purposes.

ankrgyl avatar Aug 24 '22 15:08 ankrgyl

This PR seems almost ready, I'd just update:

  • all code examples to use either LayoutLMTokenizer or AutoTokenizer
  • add a working code example of LayoutLMForQuestionAnswering/TFLayoutLMForQuestionAnswering, with an expected output

NielsRogge avatar Aug 24 '22 16:08 NielsRogge

I actually don't have a pre-trained TFLayoutLMForQuestionAnswering (i.e. one with tensorflow weights), but I could use the same code and just reference the base model?

I'll make the other updates now.

ankrgyl avatar Aug 24 '22 16:08 ankrgyl

I actually don't have a pre-trained TFLayoutLMForQuestionAnswering (i.e. one with tensorflow weights), but I could use the same code and just reference the base model?

The Transformers library makes sure that any PyTorch model also works in the other framework, and vice versa, due to the same variable names being used. So you can just do:

from transformers import TFLayoutLMForQuestionAnswering

model = TFLayoutLMForQuestionAnswering.from_pretrained("impira/layoutlm-document-qa", from_pt=True)

and it should work (this should also normally be tested with the PT-TF cross equivalence test). You can then perhaps do model.push_to_hub("impira/layoutlm-document-qa") to upload the TF weights to the same repo. This way, you can remove the from_pt statement.

NielsRogge avatar Aug 24 '22 16:08 NielsRogge

Wow, that is super cool! Okay let me give it a try.

ankrgyl avatar Aug 24 '22 16:08 ankrgyl

Ok @NielsRogge I've made all of these changes. It was a really nice idea to put a fully working example in there. I've also pushed the TF weights to the hub.

ankrgyl avatar Aug 24 '22 17:08 ankrgyl

@NielsRogge @Narsil the test failures are now occurring because LayoutLMForQuestionAnswering is not in any sort of auto mapping (for example, tests/test_modeling_tf_common.py:_prepare_for_class uses the auto mapping to determine what the expected output labels are. I'm not sure what the best way to proceed with this is. Perhaps we include it in the QuestionAnswering mapping just to keep the commit (a) consistent with LayoutLMv2-3 and (b) passing tests, and then solve the auto mapping issue properly in PR #18414?

ankrgyl avatar Aug 24 '22 18:08 ankrgyl

@ankrgyl normally if you run make fixup and it complains about a model not being in any auto mapping, you can add it to utils/check_repo.py in the IGNORE_NON_AUTO_CONFIGURED mapping.

Then, in #18414, you can remove it from this mapping and add it to the auto mapping instead.

NielsRogge avatar Aug 25 '22 07:08 NielsRogge

@ankrgyl normally if you run make fixup and it complains about a model not being in any auto mapping, you can add it to utils/check_repo.py in the IGNORE_NON_AUTO_CONFIGURED mapping.

@NielsRogge I actually have already added it here, and it still fails the tests :(. The reason is that I've included it in tests/models/layoutlm/test_modeling_layoutlm.py:LayoutLMModelTest.all_model_classes. I feel like there's a tradeoff here: I can either exclude it from all tests, or put it into the QuestionAnswering auto class and then remove it shortly in PR #18414. Let me know what you think is best.

ankrgyl avatar Aug 26 '22 05:08 ankrgyl

Following up on this @NielsRogge @Narsil @sgugger, could you please advise on how to proceed? It seems that if something has tests then it must be in an Auto model list (the failing tests are the due to LayoutLMForQuestionAnswering not being part of any Auto model).

Please correct me if I'm wrong, but my understanding is that we have the following options for how to proceed:

  1. Add LayoutLMForQuestionAnswering to the AutoModelForQuestionAnswering pipeline, which will make the tests pass. I'll remove it shortly after in https://github.com/huggingface/transformers/pull/18414.
  2. Remove all tests about LayoutLMForQuestionAnswering and add them in https://github.com/huggingface/transformers/pull/18414.
  3. Add AutoModelForDocumentQuestionAnswering in this PR, and then simply extend/use it in PR #18414.

ankrgyl avatar Aug 30 '22 00:08 ankrgyl

To make all tests pass, you need to overwrite the _prepare_for_class method defined in test_modeling_common.py, to make sure the targets are prepared correctly for LayoutLMForQuestionAnswering. It can be defined as follows in test_modeling_layoutlm.py:

def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
    inputs_dict = copy.deepcopy(inputs_dict)
    if return_labels:
        if model_class in get_values(MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING):
            inputs_dict["labels"] = torch.zeros(
                self.model_tester.batch_size, dtype=torch.long, device=torch_device
            )
        elif model_class in [
            *get_values(MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING),
            *get_values(MODEL_FOR_MASKED_LM_MAPPING),
        ]:
            inputs_dict["labels"] = torch.zeros(
                (self.model_tester.batch_size, self.model_tester.seq_length), dtype=torch.long, device=torch_device
            )
        elif model_class.__name__ == "LayoutLMForQuestionAnswering":
            inputs_dict["start_positions"] = torch.zeros(
                self.model_tester.batch_size, dtype=torch.long, device=torch_device
            )
            inputs_dict["end_positions"] = torch.zeros(
                self.model_tester.batch_size, dtype=torch.long, device=torch_device
            )
  
    return inputs_dict

This can then be removed once the model is added to an Auto mapping.

The same needs to happen for the TF model.

NielsRogge avatar Aug 30 '22 07:08 NielsRogge

Regarding the failing tests - you might need to rebase with the main branch.

Also note that sometimes, tests which are totally unrelated to your PR fail, in which case you can ignore them.

NielsRogge avatar Aug 30 '22 14:08 NielsRogge

Thanks @NielsRogge just rebased

ankrgyl avatar Aug 30 '22 15:08 ankrgyl

@NielsRogge I believe all outstanding comments have been addressed. Are we ready to merge this in?

ankrgyl avatar Aug 30 '22 15:08 ankrgyl

I've pinged @sgugger for a final review, however he's off this week so will be merged next week :)

NielsRogge avatar Aug 30 '22 16:08 NielsRogge

Thank you for merging it in! @LysandreJik or @NielsRogge are you planning to do any sort of announcement? I'm asking because we're going to publicly announce the project we've been working on (https://github.com/impira/docquery) in the next few days, and it would be great to collaborate.

ankrgyl avatar Aug 31 '22 13:08 ankrgyl

I'd like to communicate on that once the pipeline is merged, because the Space above is using that right?

Also, the doc tests don't seem to pass:

_ [doctest] transformers.models.layoutlm.modeling_layoutlm.LayoutLMForQuestionAnswering.forward _
1328         ...         bbox.append([0] * 4)
1329         >>> encoding["bbox"] = torch.tensor([bbox])
1330 
1331         >>> word_ids = encoding.word_ids(0)
1332         >>> outputs = model(**encoding)
1333         >>> loss = outputs.loss
1334         >>> start_scores = outputs.start_logits
1335         >>> end_scores = outputs.end_logits
1336         >>> start, end = word_ids[start_scores.argmax(-1)], word_ids[end_scores.argmax(-1)]
1337         >>> print(" ".join(words[start : end + 1]))
Expected:
    M. Hamann P. Harper, P. Martinez
Got:
    J. S. Wigand

/__w/transformers/transformers/src/transformers/models/layoutlm/modeling_layoutlm.py:1337: DocTestFailure
_ [doctest] transformers.models.layoutlm.modeling_tf_layoutlm.TFLayoutLMForQuestionAnswering.call _
[15](https://github.com/huggingface/transformers/runs/8125145111?check_suite_focus=true#step:9:16)53         ...         bbox.append([0] * 4)
1554         >>> encoding["bbox"] = tf.convert_to_tensor([bbox])
1555 
1556         >>> word_ids = encoding.word_ids(0)
1557         >>> outputs = model(**encoding)
1558         >>> loss = outputs.loss
1559         >>> start_scores = outputs.start_logits
1560         >>> end_scores = outputs.end_logits
1561         >>> start, end = word_ids[tf.math.argmax(start_scores, -1)[0]], word_ids[tf.math.argmax(end_scores, -1)[0]]
1562         >>> print(" ".join(words[start : end + 1]))
Expected:
    M. Hamann P. Harper, P. Martinez
Got:
    <BLANKLINE>

NielsRogge avatar Sep 01 '22 07:09 NielsRogge

Hi @ankrgyl Thanks a lot for adding (TF)LayoutLMForQuestionAnswering !

For the doctest:

  • TFLayoutLMForQuestionAnswering seems to have issue loading the weights for qa_outputs. Could you check if the TF checkpoint in impira/layoutlm-document-qa has weights for this part, or see if you can find what goes wrong? The warning message is

    Some layers of TFLayoutLMForQuestionAnswering were not initialized from the model checkpoint at impira/layoutlm-document- qa and are newly initialized: ['qa_outputs']
    

    and I actually got some random results for this test.

  • LayoutLMForQuestionAnswering weight loading looks fine, but the output is different from the expected value. Could you take a look here?

Here is how you can run the doctest

First

python utils/prepare_for_doc_test.py src/transformers/utils/doc.py

Then for LayoutLMForQuestionAnswering:

python utils/prepare_for_doc_test.py src/transformers/models/layoutlm/modeling_layoutlm.py
pytest --doctest-modules src/transformers/models/layoutlm/modeling_layoutlm.py -sv --doctest-continue-on-failure

For TFLayoutLMForQuestionAnswering:

python utils/prepare_for_doc_test.py src/transformers/models/layoutlm/modeling_tf_layoutlm.py
pytest --doctest-modules src/transformers/models/layoutlm/modeling_tf_layoutlm.py -sv --doctest-continue-on-failure

Thank you again! If you have trouble on debugging this, let me know :-)

ydshieh avatar Sep 01 '22 08:09 ydshieh

Hi @NielsRogge @ydshieh I'm very sorry about that -- what happened is that we've updated the weights on the underlying model and it's returning a different name from the same document (the question itself is slightly ambiguous).

I've confirmed that if I pin the revision in the tests, they pass. I've just submitted https://github.com/huggingface/transformers/pull/18854 to resolve that.

I'll investigate the weights in impira/layoutlm-document-qa in parallel.

ankrgyl avatar Sep 01 '22 14:09 ankrgyl

I'd like to communicate on that once the pipeline is merged, because the Space above is using that right?

@NielsRogge the Space is indeed using the pipeline (and incorporates Donut too). It makes sense to do the announcement after that lands. We'll still do ours today but simply mention that we are working to upstream changes. Let me know if y'all have any concerns about that.

ankrgyl avatar Sep 01 '22 14:09 ankrgyl

Hi @NielsRogge @ydshieh I'm very sorry about that -- what happened is that we've updated the weights on the underlying model and it's returning a different name from the same document (the question itself is slightly ambiguous).

No problem, thanks for the fix.

I've confirmed that if I pin the revision in the tests, they pass. I've just submitted #18854 to resolve that.

Great!

ydshieh avatar Sep 01 '22 14:09 ydshieh