OCR-Form-Tools icon indicating copy to clipboard operation
OCR-Form-Tools copied to clipboard

Issue with the correct labeling of selectionMarks - v2.1 - preview.3

Open at-philipp-heinrich opened this issue 5 years ago • 1 comments

Description: We have several identical files, with the same layout and selectionMarks in the same place, filled in differently. In the tag-editor, the selection marks are labeled on some pages and on some they are not. Even if I draw a region by myself, it only returns a NULL. It also differs in whether it is handwritten or not.

In the analyzed result it is the same. In some files they are labeled and in others they are not. So there's no real clue as to why it's reacting that way. Because the files are identical in layout.

Questions:

  • In this case, is it advisable to add even more files to give the AI more information. Because so far we have added 12 files?(identical but filled differently)
  • Is it perhaps due to the shape of the selectionMarks that they are not properly labeled?

Edit: The problem also occurs with other selectionMarks that are not so close to each other, as seen below in the 2nd image. So in this case it can't be due to the layout.

Additional context Fott_examples

Fott_examples_2

at-philipp-heinrich avatar Apr 21 '21 10:04 at-philipp-heinrich

I've seen similar issues with 'radio' type selectionMark, checkbox style seem to always work. It's actually an issue with the form recognizer layout analyze API, not FOTT. You can see this by using the "layout analyze" option in fott-preview, you'll see that the selection marks you have issues with will not be found their either.

I posted an article on StackOverflow about this & heard back from Microsoft on it. I was able to send them some images that I was having issues with, they ran it through the new version of the detection API and said that the issues I was facing was fixed in the next preview release, scheduled for ~5/21. https://stackoverflow.com/questions/67183842/training-custom-form-selectionmark-bounding-box-identification-issues

Couple suggestions that may help:

  • When doing your custom model training, only use documents where all of the selectionMarks are found.
  • I've found that in many cases a higher resolution image helps the form recognizer API detect them but it's not always an option for end user submitted images.
  • The more documents in your training model, the better it will work. They suggest a min of 5, but if there are issues use 10-15 so you should be ok. I trained our model with 20 and am still seeing issues. https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/build-training-data-set#training-data-tips

-Rich W

RJWerning avatar May 04 '21 13:05 RJWerning