amazon-textract-response-parser
amazon-textract-response-parser copied to clipboard
Highlighted text get appended with word SELECTED in ouput
Textract response library appends the text SELECTED when something is highlighted in the text shown below are the example The original doc
This is how the output looks like
Code to generate the above output
Looks like checkbox/marked identification from checkbox model, which is part of TABLES and FORMS. Those are printed out as part of the rendering when available. No parameter right now to turn them off unfortunately. Workaround could be to filter the SELECTION_ELEMENTS out in the JSON before sending to trp, till we make a param available.
@athewsey I'm running into a similar but opposite issue when it comes to selections:
SELECTED/NOT_SELECTED gets detected properly under Forms, but doesn't show up properly (shows up as an empty figure div) when rendered to HTML:
<p>
What type of financing is the Borrower seeking? *
</p>
<p>
Life
</p>
<p>
CMBS
</p>
<div class="figure"></div>
<p>
Agency
</p>
<div class="figure"></div>
<p>
Bridge
</p>
<div class="figure"></div>
<p>
Bank
</p>
<div class="figure"></div>
<p>
Credit Union
</p>
<div class="figure"></div>
<p>
Non-Bank
</p>
Hi @sawasume, sorry just to check - are you using TRP in JavaScript? Or Python?
If Python then can ignore the rest of this message, but if JS then there's some fact-finding that'd be useful:
In this case from the generated HTML it looks like the items aren't getting detected as checkboxes at all: as I'd expect to see an <input tag if they were...
- Did you run the document analysis with the
FORMSfeature enabled in that case? - (With forms enabled) Can you find the relevant checkboxes under
page.formin the result? - If the K-V detections aren't being merged properly into those HTML entries there - do you find them duplicated anywhere else in the HTML output?
- Any chance you could locate the block for one of them (say, CMBS) in the raw data, and check that it's linked as a "relationship" from both A) a key value set block and B) a LINE which is in turn referenced by some kind of LAYOUT_* block?