Transformers-Tutorials
Transformers-Tutorials copied to clipboard
Need help in key value pair extraction.
Can someone please guide me how can I get the key value pair from a scanned invoice using LayoutLM.
Refer to https://github.com/huggingface/transformers/issues/15451#issue-1120232737
Model for relation extraction working quite bad on real data. Maybe i failed in training or data prep. Maybe you will be more succesfull in that
My advice is to use the output from layoutlmv2_for_token_classification in some alghoritmic logic for forming a key-value pairs. You will need a module for text grouping based on their label(from model prediction), location between same labeled tokens, location of different labeled tokens and so on.
Can't provide the code but it's working
@fraps12 Thanks for the replay.
Just wanted to how the model is predicting post that will see. Will go for fine tuning or will go for training. As of now i have tried this much but i am getting error. Can you please help me to fix the error.
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
tokenizer = AutoTokenizer.from_pretrained(path, pad_token='')
model = LayoutLMv2ForRelationExtraction.from_pretrained(path)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
image_file = 'image4.png'
image = Image.open(image_file).convert('RGB')
image
width, height = image.size
w_scale = 1000/width
h_scale = 1000/height
ocr_data = pytesseract.image_to_data(image, output_type='data.frame')
ocr_data = ocr_data.dropna()
ocr_data.assign(left_scaled = ocr_data.leftw_scale, width_scaled = ocr_data.widthw_scale,
top_scaled = ocr_data.toph_scale, height_scaled = ocr_data.heighth_scale,
right_scaled = lambda x: x.left_scaled + x.width_scaled,
bottom_scaled = lambda x: x.top_scaled + x.height_scaled)
float_cols = ocr_data.select_dtypes('float').columns
ocr_data[float_cols] = ocr_data[float_cols].round(0).astype(int)
ocr_data = ocr_data.replace(r'^\s*$', np.nan, regex=True)
ocr_data = ocr_data.dropna().reset_index(drop=True)
ocr_datawords = list(ocr_data.text)
coordinates = ocr_data[['left', 'top', 'width', 'height']]
actual_boxes = []
for idx, row in coordinates.iterrows():
x, y, w, h = tuple(row) # the row comes in (left, top, width, height) format
actual_box = [x, y, x+w, y+h] # we turn it into (left, top, left+widght, top+height) to get the actual box
actual_boxes.append(actual_box)
def normalize_box(box, width, height):
return [
int(1000 * (box[0] / width)),
int(1000 * (box[1] / height)),
int(1000 * (box[2] / width)),
int(1000 * (box[3] / height)),
]
boxes = []
for box in actual_boxes:
boxes.append(normalize_box(box, width, height))
encoding = tokenizer.encode_plus(ocr_datawords, boxes=boxes, return_tensors='pt')
input_id = encoding['input_ids']
attention_masks = encoding['attention_mask']
boxes = encoding['bbox']
encoding.keys()
outputs = model(**encoding)
this is the error.
AttributeError Traceback (most recent call last)
c:\Users\name\Parallel\Trans_LayoutXLM.ipynb Cell 9 in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/Users/name/Parallel%20Project/Trans_LayoutXLM.ipynb#ch0000009?line=0) outputs = model(**encoding)
File c:\Users\name\.conda\envs\layoutlmft\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
File c:\Users\name\.conda\envs\layoutlmft\lib\site-packages\transformers\models\layoutlmv2\modeling_layoutlmv2.py:1585, in LayoutLMv2ForRelationExtraction.forward(self, input_ids, bbox, labels, image, attention_mask, token_type_ids, position_ids, head_mask, entities, relations)
1522 @add_start_docstrings_to_model_forward(LAYOUTLMV2_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1523 @replace_return_docstrings(output_type=RegionExtractionOutput, config_class=_CONFIG_FOR_DOC)
1524 def forward(
(...)
1535 relations=None,
1536 ):
1537 r"""
1538 entities (list of dicts of shape `(batch_size,)` where each dict contains:
1539 {
(...)
1582 >>> relations = *****
1583 ```"""
-> 1585 outputs = self.layoutlmv2(
1586 input_ids=input_ids,
1587 bbox=bbox,
1588 image=image,
1589 attention_mask=attention_mask,
1590 token_type_ids=token_type_ids,
1591 position_ids=position_ids,
1592 head_mask=head_mask,
...
--> 590 images_input = ((images if torch.is_tensor(images) else images.tensor) - self.pixel_mean) / self.pixel_std
591 features = self.backbone(images_input)
592 features = features[self.out_feature_key]
AttributeError: 'NoneType' object has no attribute 'tensor'
Able to extract key-value pair, hence closing the issue.
Hello @Laxmi530 Could you please explain to me how you got the key-value pairs? Have you used the LayoutLmForRelationExtraction model?
Thanks!
I used LayoutLMV2 for the key value pair extraction. From the form recognition set the question as key and the answer as value. Need to apply some technique.
Thanks for your answer @Laxmi530 .
I used LayoutLMV2 for the key value pair extraction.
You mean LayoutLMV2 for token classification, or it's another model?
To link between the questions and answers, is it possible to share your approach? Actually, LayoutLMv2 for token classification operates only on token-level, ie does not detect full questions/ answers. So sometimes, it's not possible to associate tokens to get full keys/ values.
Thanks a lot!
I used this
feature_extractor = LayoutLMv2FeatureExtractor.from_pretrained("microsoft/layoutlmv2-base-uncased")
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
model = LayoutLMv2ForTokenClassification.from_pretrained("nielsr/layoutlmv2-finetuned-funsd")
the key value pair extraction is based on token basics only. You need to finetune the model on your dataset as like FUNSD dataset. I did not go deep dive for the key-value pair extraction but yes i finetuned the model, out of 5 documents in 3 document it extracts key value nicely. One more thing it is using pytesseract behind the scenes what text it extracts it will process in that way.
@Laxmi530
Thanks for the explanation. But LayoutLMv2ForTokenClassification does not associate keys and values.
It does only extract keys and values at token-level without associating all together tokens that belong to the same key or value.
That's why I wanted to know how you could associate them on your side (ie token-level -> key/value level -> key-value pair)?
Sure, I will help you but i did not get any details on your github profile. So, can you please share any details like Linkdin or anything of your so that in future if i need any help I can msg you.
A lot of thanks @Laxmi530. You should now see my LinkedIn profile link on my github profile!
If you want, we can keep talking about the relation extraction there.
Thank you so much @hjerbii for sharing your Linkdin profile. whatever you have doubt will discusses over there. Thank you.