amazon-textract-response-parser
amazon-textract-response-parser copied to clipboard
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
I'm trying to load into trp.Document a .json file that is the result of an texttract execution done with start_document_analysis
The code is done on Python 3.9
I'm loading the json_file in a variable and trying to parse it with trp
# Load json
with open(f"/tmp/{file_name}", "rb") as json_file:
textract_json = json.load(json_file)
doc = Document(textract_json)
It works fine with synchronous calls but not with asynchronous calls
Traceback (most recent call last):
File "/var/task/app.py", line 115, in lambda_handler
process_file(file_name)
File "/var/task/app.py", line 53, in process_file
doc = Document(textract_json)
File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 638, in __init__
self._parse()
File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 675, in _parse
page = Page(documentPage["Blocks"], self._blockMap)
File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 522, in __init__
self._parse(blockMap)
File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 543, in _parse
t = Table(item, blockMap)
File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 438, in __init__
cell = Cell(blockMap[cid], blockMap)
File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 361, in __init__
self._text = self._text + w.text + ' '
Using doc = TDocumentSchema().load(textract_json) also give me validation exceptions.
Any clue about what I'm doing wrong?