amazon-textract-response-parser icon indicating copy to clipboard operation
amazon-textract-response-parser copied to clipboard

TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Open CGarces opened this issue 3 years ago • 0 comments

I'm trying to load into trp.Document a .json file that is the result of an texttract execution done with start_document_analysis

The code is done on Python 3.9

I'm loading the json_file in a variable and trying to parse it with trp

    # Load json
    with open(f"/tmp/{file_name}", "rb") as json_file:
        textract_json = json.load(json_file)

   doc = Document(textract_json)

It works fine with synchronous calls but not with asynchronous calls

Traceback (most recent call last):
  File "/var/task/app.py", line 115, in lambda_handler
    process_file(file_name)
  File "/var/task/app.py", line 53, in process_file
    doc = Document(textract_json)
  File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 638, in __init__
    self._parse()
  File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 675, in _parse
    page = Page(documentPage["Blocks"], self._blockMap)
  File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 522, in __init__
    self._parse(blockMap)
  File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 543, in _parse
    t = Table(item, blockMap)
  File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 438, in __init__
    cell = Cell(blockMap[cid], blockMap)
  File "/var/lang/lib/python3.9/site-packages/trp/__init__.py", line 361, in __init__
    self._text = self._text + w.text + ' '

Using doc = TDocumentSchema().load(textract_json) also give me validation exceptions.

Any clue about what I'm doing wrong?

CGarces avatar Sep 25 '22 22:09 CGarces