Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

Need help in key value pair extraction.

Open Laxmi530 opened this issue 1 year ago • 3 comments

Can someone please guide me how can I get the key value pair from a scanned invoice using LayoutLM.

Laxmi530 avatar Jul 14 '22 07:07 Laxmi530

Refer to https://github.com/huggingface/transformers/issues/15451#issue-1120232737

NielsRogge avatar Jul 14 '22 09:07 NielsRogge

Model for relation extraction working quite bad on real data. Maybe i failed in training or data prep. Maybe you will be more succesfull in that

My advice is to use the output from layoutlmv2_for_token_classification in some alghoritmic logic for forming a key-value pairs. You will need a module for text grouping based on their label(from model prediction), location between same labeled tokens, location of different labeled tokens and so on.

Can't provide the code but it's working

fraps12 avatar Jul 27 '22 16:07 fraps12

@fraps12 Thanks for the replay.

Just wanted to how the model is predicting post that will see. Will go for fine tuning or will go for training. As of now i have tried this much but i am getting error. Can you please help me to fix the error.

feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
tokenizer = AutoTokenizer.from_pretrained(path, pad_token='')
model = LayoutLMv2ForRelationExtraction.from_pretrained(path)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

image_file = 'image4.png'
image = Image.open(image_file).convert('RGB')
image

width, height = image.size
w_scale = 1000/width
h_scale = 1000/height
ocr_data = pytesseract.image_to_data(image, output_type='data.frame')
ocr_data = ocr_data.dropna()
ocr_data.assign(left_scaled = ocr_data.leftw_scale, width_scaled = ocr_data.widthw_scale,
top_scaled = ocr_data.toph_scale, height_scaled = ocr_data.heighth_scale,
right_scaled = lambda x: x.left_scaled + x.width_scaled,
bottom_scaled = lambda x: x.top_scaled + x.height_scaled)
float_cols = ocr_data.select_dtypes('float').columns
ocr_data[float_cols] = ocr_data[float_cols].round(0).astype(int)
ocr_data = ocr_data.replace(r'^\s*$', np.nan, regex=True)
ocr_data = ocr_data.dropna().reset_index(drop=True)
ocr_datawords = list(ocr_data.text)

coordinates = ocr_data[['left', 'top', 'width', 'height']]
actual_boxes = []
for idx, row in coordinates.iterrows():
x, y, w, h = tuple(row) # the row comes in (left, top, width, height) format
actual_box = [x, y, x+w, y+h] # we turn it into (left, top, left+widght, top+height) to get the actual box
actual_boxes.append(actual_box)

def normalize_box(box, width, height):
return [
int(1000 * (box[0] / width)),
int(1000 * (box[1] / height)),
int(1000 * (box[2] / width)),
int(1000 * (box[3] / height)),
]
boxes = []
for box in actual_boxes:
boxes.append(normalize_box(box, width, height))
encoding = tokenizer.encode_plus(ocr_datawords, boxes=boxes, return_tensors='pt')
input_id = encoding['input_ids']
attention_masks = encoding['attention_mask']
boxes = encoding['bbox']
encoding.keys()
outputs = model(**encoding)

this is the error.

AttributeError                            Traceback (most recent call last)
c:\Users\name\Parallel\Trans_LayoutXLM.ipynb Cell 9 in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/Users/name/Parallel%20Project/Trans_LayoutXLM.ipynb#ch0000009?line=0) outputs = model(**encoding)

File c:\Users\name\.conda\envs\layoutlmft\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File c:\Users\name\.conda\envs\layoutlmft\lib\site-packages\transformers\models\layoutlmv2\modeling_layoutlmv2.py:1585, in LayoutLMv2ForRelationExtraction.forward(self, input_ids, bbox, labels, image, attention_mask, token_type_ids, position_ids, head_mask, entities, relations)
   1522 @add_start_docstrings_to_model_forward(LAYOUTLMV2_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
   1523 @replace_return_docstrings(output_type=RegionExtractionOutput, config_class=_CONFIG_FOR_DOC)
   1524 def forward(
   (...)
   1535     relations=None,
   1536 ):
   1537     r"""
   1538     entities (list of dicts of shape `(batch_size,)` where each dict contains:
   1539         {
   (...)
   1582     >>> relations = *****
   1583     ```"""
-> 1585     outputs = self.layoutlmv2(
   1586         input_ids=input_ids,
   1587         bbox=bbox,
   1588         image=image,
   1589         attention_mask=attention_mask,
   1590         token_type_ids=token_type_ids,
   1591         position_ids=position_ids,
   1592         head_mask=head_mask,
...
--> 590     images_input = ((images if torch.is_tensor(images) else images.tensor) - self.pixel_mean) / self.pixel_std
    591     features = self.backbone(images_input)
    592     features = features[self.out_feature_key]

AttributeError: 'NoneType' object has no attribute 'tensor'

Laxmi530 avatar Jul 29 '22 05:07 Laxmi530

Able to extract key-value pair, hence closing the issue.

Laxmi530 avatar Oct 17 '22 09:10 Laxmi530

Hello @Laxmi530 Could you please explain to me how you got the key-value pairs? Have you used the LayoutLmForRelationExtraction model?

Thanks!

hjerbii avatar Oct 26 '22 22:10 hjerbii

I used LayoutLMV2 for the key value pair extraction. From the form recognition set the question as key and the answer as value. Need to apply some technique.

Laxmi530 avatar Oct 27 '22 14:10 Laxmi530

Thanks for your answer @Laxmi530 .

I used LayoutLMV2 for the key value pair extraction.

You mean LayoutLMV2 for token classification, or it's another model?

To link between the questions and answers, is it possible to share your approach? Actually, LayoutLMv2 for token classification operates only on token-level, ie does not detect full questions/ answers. So sometimes, it's not possible to associate tokens to get full keys/ values.

Thanks a lot!

hjerbii avatar Oct 27 '22 14:10 hjerbii

I used this

feature_extractor = LayoutLMv2FeatureExtractor.from_pretrained("microsoft/layoutlmv2-base-uncased")
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
model = LayoutLMv2ForTokenClassification.from_pretrained("nielsr/layoutlmv2-finetuned-funsd")

the key value pair extraction is based on token basics only. You need to finetune the model on your dataset as like FUNSD dataset. I did not go deep dive for the key-value pair extraction but yes i finetuned the model, out of 5 documents in 3 document it extracts key value nicely. One more thing it is using pytesseract behind the scenes what text it extracts it will process in that way.

Laxmi530 avatar Oct 28 '22 14:10 Laxmi530

@Laxmi530
Thanks for the explanation. But LayoutLMv2ForTokenClassification does not associate keys and values. It does only extract keys and values at token-level without associating all together tokens that belong to the same key or value. That's why I wanted to know how you could associate them on your side (ie token-level -> key/value level -> key-value pair)?

hjerbii avatar Oct 28 '22 14:10 hjerbii

Sure, I will help you but i did not get any details on your github profile. So, can you please share any details like Linkdin or anything of your so that in future if i need any help I can msg you.

Laxmi530 avatar Oct 29 '22 13:10 Laxmi530

A lot of thanks @Laxmi530. You should now see my LinkedIn profile link on my github profile!

If you want, we can keep talking about the relation extraction there.

hjerbii avatar Oct 29 '22 14:10 hjerbii

Thank you so much @hjerbii for sharing your Linkdin profile. whatever you have doubt will discusses over there. Thank you.

Laxmi530 avatar Oct 30 '22 13:10 Laxmi530