Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

How to get LayoutLMv2 output as key-value pairs?

Open avinashok opened this issue 2 years ago • 22 comments

Model I am using is LayoutLMv2:

(Link of the demo for reference: https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD )

I do get 'questions' & 'answers' as separate colored boxes in output image. But is there a way to get it as a python dictionary (key-value pairs), as in questions become keys & answers become its corresponding values?

avinashok avatar Oct 07 '21 16:10 avinashok

Hi,

This is definitely on my roadmap. The LayoutLMv2 authors defined another model called LayoutLMv2ForRelationExtraction, that does exactly that. However, they did not specify how to use the model at inference time, and I should look more into it in order to know how it works.

If you have the time to look into it, let me know, then we can add it to HuggingFace Transformers.

NielsRogge avatar Oct 13 '21 08:10 NielsRogge

Hi @NielsRogge,

Thanks for replying & glad to know it is already in your roadmap. I tried grouping the questions and answers based on pixel positioning of the layout boxes, but there is a bit of heuristics to it which is why I thought of reaching out directly to you.

What I tried:

## From the code: https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD/blob/386ad78844f905dfbb81072908c51ba344427587/app.py

encoding_feature_extractor = feature_extractor(image, return_tensors="pt")
words, boxes = encoding_feature_extractor.words, encoding_feature_extractor.boxes

####
""" The rest of the code comes here."""
####

layout_details = []
for prediction, box in zip(true_predictions, true_boxes):
    predicted_label = iob_to_label(prediction).lower()
    layout_details.append((predicted_label, prediction, box, label2color[predicted_label]))

#### Further, 

for i, j in zip(words[0], layout_details[1:-1]):
    print(i, j)  

This gives the corresponding tags, pixel positions, words for each layout blocks, which can be further grouped based on position of words. I used a threshold, in a way that, if the position of answer block is within the threshold range of question block, then associate them together as key-value pairs, so on & so forth.

Also, I was referring to line 139 https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_re.py based on the issue response for https://github.com/microsoft/unilm/issues/465 .

I'll definitely take a look at LayoutLMv2ForRelationExtraction

avinashok avatar Oct 13 '21 15:10 avinashok

Hi @NielsRogge,

Thank you for your amazing work.

I added the LayoutLMv2ForRelationExtraction class to modeling_layoutlmv2.py.

from transformers import LayoutLMv2ForRelationExtraction
model = LayoutLMv2ForRelationExtraction.from_pretrained("nielsr/layoutlmv2-finetuned-funsd")
model.to(device)

Here is the output:

Some weights of the model checkpoint at nielsr/layoutlmv2-finetuned-funsd were not used when initializing
 LayoutLMv2ForRelationExtraction: 
['classifier.weight', 'classifier.bias']
- This IS expected if you are initializing LayoutLMv2ForRelationExtraction from 
the checkpoint of a model trained on another task or with another architecture (e.g. initializing 
a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LayoutLMv2ForRelationExtraction from the checkpoint 
- of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model
-  from a BertForSequenceClassification model).
Some weights of LayoutLMv2ForRelationExtraction were not initialized from the model checkpoint at 
nielsr/layoutlmv2-finetuned-funsd and are newly initialized: 
['extractor.rel_classifier.linear.bias', extractor.rel_classifier.linear.weight', 'extractor.ffnn_head.3.bias', 
'extractor.ffnn_tail.0.bias', 'extractor.ffnn_head.0.weight', 'extractor.ffnn_head.0.bias', 
'extractor.rel_classifier.bilinear.weight', 'extractor.ffnn_tail.3.weight', 'extractor.ffnn_tail.0.weight', 
'extractor.entity_emb.weight', 'extractor.ffnn_tail.3.bias', 'extractor.ffnn_head.3.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

What would be the guidance on the next step? Is the pre-trained model only contains the Semantic Entity Recognition part?

https://github.com/microsoft/unilm/issues/429, https://github.com/microsoft/unilm/issues/465 are related. Unfortunately, https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_re.py does not contain
if training_args.do_predict: block at the end.

Isydmr avatar Oct 26 '21 15:10 Isydmr

Any update guys, using LayoutXLM separately just for linking would not make sense, and for Semantic Entity Recognition, LayoutLMv2 looks better than XLM when looking at numbers on FUNSD dataset.

We can maybe make up a plan and work out to get the LayoutLMv2 to do the Relation Extraction. So we can have LMv2 itself for both SER and RE.

abdksyed avatar Nov 15 '21 11:11 abdksyed

mark

WenmuZhou avatar Nov 16 '21 12:11 WenmuZhou

Hi @Isydmr , @avinashok Can you please share the inference pipeline for RelationExtraction model ?

and is there any way that we can convert the results of LayoutLMv2 in a key-value format ?

fadi212 avatar Dec 08 '21 09:12 fadi212

@fadi212 @abdksyed @avinashok In the above thread someone has suggested a solution with a working colab example that you can use.

They are also fixing up and adding this class in a separate pull request for those that want to wait for a proper release.

mattdeeperinsights avatar Feb 17 '22 10:02 mattdeeperinsights

ft/examples/run_xfun_re.py based on the issue response for microsoft/unilm#465 .

Hi, were you able to find key value pairs?

anamtaamin avatar Apr 09 '22 09:04 anamtaamin

Hi

is there any update on this? getting the output as key-value pairs? @avinashok can you please share the complete code of the solution you mentioned above ?

jyotiyadav94 avatar Jun 03 '22 09:06 jyotiyadav94

Hi

is there any update on this? getting the output as key-value pairs? @avinashok can you please share the complete code of the solution you mentioned above ?

@jyotiyadav94 I can see @R0bk already mentioning the solution in one of the above comments with a colab notebook version of it.

avinashok avatar Jun 19 '22 00:06 avinashok

Hi,

This is definitely on my roadmap. The LayoutLMv2 authors defined another model called LayoutLMv2ForRelationExtraction, that does exactly that. However, they did not specify how to use the model at inference time, and I should look more into it in order to know how it works.

If you have the time to look into it, let me know, then we can add it to HuggingFace Transformers.

hi , how can we get key- value extraction , like : {'invoice number': '123456', 'date':'23/04/2022', 'amount':'44987', ....}

aditya11ad avatar Jun 21 '22 08:06 aditya11ad

@jyotiyadav94 @aditya11ad Did you get solution? I saw the colab(from @avinashok), but i can't find the way to extract Key-Value pairs.

yellowjs0304 avatar Jul 12 '22 02:07 yellowjs0304

Hi @yellowjs0304 I basically used this approach https://medium.com/mlearning-ai/ai-in-the-real-world-form-processing-c96912d80ef2 to get the key value pairs.

jyotiyadav94 avatar Jul 13 '22 06:07 jyotiyadav94

@jyotiyadav94 Thank you for sharing idea. Is this library also available in another OCR?(Not tesseract OCR, I got seperated OCR results)

yellowjs0304 avatar Jul 13 '22 06:07 yellowjs0304

@yellowjs0304 can you provide me with your Gmail id I will share the complete link of the code for this?

jyotiyadav94 avatar Jul 13 '22 07:07 jyotiyadav94

@jyotiyadav94 Sure, the contact mail is at the top of my profile readme. Thank you :)

yellowjs0304 avatar Jul 13 '22 07:07 yellowjs0304

@jyotiyadav94 You said you'll going to share the complete code, Where can I find it?

thanks!

NurielWainstein avatar Jul 31 '22 06:07 NurielWainstein

@nurielw05 I saw really late. jyoti shared this link which is related with above post.

yellowjs0304 avatar Aug 10 '22 00:08 yellowjs0304

@jyotiyadav94 this only work if the value is on the right side of the key, what if the value is under the key?

like this:

total: name: 2323 nuriel

NurielWainstein avatar Aug 10 '22 07:08 NurielWainstein

Hello!

I'm having the same issue as all of you. Actually in this notebook .

In the inference part, it is not clear how we can build the entities list (define tails and heads) in the case where our input is an image and we extract entities using LayoutLMv2TokenClassification (for example). Tails and heads are not given by the model. Could you please update the notebook with an inference example that uses only an image as input? Many thanks in advance!

hjerbii avatar Aug 12 '22 09:08 hjerbii

Tails and heads are not given by the model

=> Tails are questions, and answers are heads (or vice versa). So LayoutLMForTokenClassification does provide you that.

NielsRogge avatar Aug 12 '22 10:08 NielsRogge

image But how do we get the ids where the entities start/end?

hjerbii avatar Aug 12 '22 11:08 hjerbii

Hi @avinashok, can you share the code of the heruistical approach you used to groupby the questions and answers ?

Hi @NielsRogge , been following your work for quite a while and learned a lot, great work! Do we have any progress on this part ? (i.e. getting the output as {"key" : "value"}). I am struggling to post-process the predictions in this form (have tried a bunch of ways but fails in one or the other scenarios). Any help or resources on this part would help a lot :)

Gladiator07 avatar Jan 27 '23 09:01 Gladiator07

Any update guys, using LayoutXLM separately just for linking would not make sense, and for Semantic Entity Recognition, LayoutLMv2 looks better than XLM when looking at numbers on FUNSD dataset.

We can maybe make up a plan and work out to get the LayoutLMv2 to do the Relation Extraction. So we can have LMv2 itself for both SER and RE.

Hi, were you able to use LayoutLMv2 for Relation Extraction task on FUNSD dataset? Please share the relevant code/ methods to convert dataset and process it

munish0838 avatar Mar 13 '23 12:03 munish0838