Representation-Learning-for-Information-Extraction icon indicating copy to clipboard operation
Representation-Learning-for-Information-Extraction copied to clipboard

Is candidate generation important during inference?

Open vishal-nayak1 opened this issue 2 years ago • 7 comments

Hi @Praneet9, Is candidate generation important during inference as for some fields it's difficult to extract text using regex like address, company name, registration number as it keeps on changing over the templates because their pattern changes over the templates. Also what if I do not give candidates for fields like address, will model be able to predict address field ?

vishal-nayak1 avatar Nov 28 '22 04:11 vishal-nayak1

Think of candidates as a mixture of positive and negative samples. If the model doesn't see any negative examples, it is difficult to differentiate between right and wrong. This is why I personally feel candidates are required for training. For inference, it should not matter if you have candidates or not.

Praneet9 avatar Nov 28 '22 10:11 Praneet9

@Praneet9 thanks for sharing details, I have one doubt in inference file i can see you have used code to generate candidates which is feed to the model as input- link- https://github.com/Praneet9/Representation-Learning-for-Information-Extraction/blob/b268463e312689e3dc5222cebf0f5a2e4be68fb6/inference.py#L130

candidates = extract_candidates.get_candidates(ocr_results) candidates_with_neighbours = attach_neighbour_candidates(width, height, ocr_results, candidates) annotation = normalize_coordinates(candidates_with_neighbours, width, height)

Model input- with torch.no_grad(): rlie.eval() val_outputs = rlie(field_ids, candidate_cords, neighbours, neighbour_cords)

Please clarify it.

Thanks

vishal-nayak1 avatar Nov 28 '22 10:11 vishal-nayak1

I'm passing all the possible candidates that can be the classes I want. The model picks the most relevant one from them.

Praneet9 avatar Nov 28 '22 10:11 Praneet9

Yeah but is it necessary? for fields like address, company name, registration name...etc, we cannot easily extract possible candidates using regex , so if i do not pass any candidates for such fields, will model be able to predict address field as well ?

vishal-nayak1 avatar Nov 28 '22 10:11 vishal-nayak1

Here, in inference, we don't know what the actual invoice number is which is why we send all that looks like one. In your case, you can just send whichever looks like the address even if its the only one, and it should work fine.

Praneet9 avatar Nov 28 '22 10:11 Praneet9

Okay but then in such case the model is not actually extracting fields, something like extracting address from paragraph of text, it just ranking based on of our possible input candidates.I think generating possible candidates for some fields like address, registration_number is itself challenging.

vishal-nayak1 avatar Nov 28 '22 11:11 vishal-nayak1

This is a binary model that can just return True or False to the candidates you pass in and is not meant to do what you are asking for.

Praneet9 avatar Nov 28 '22 11:11 Praneet9