amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Fix linearize layout when Block Entity Types are `None`

Open BPDanek opened this issue 1 year ago • 1 comments

Entity Types are occasionally None, causing the linearize layout to fail.

This may happen in cases where there are multiple page documents.

Issue #, if available:

Description of changes: Check if entity_types is None before attempting to iterate it. Otherwise returns "NoneType is not iterable".

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

BPDanek avatar Sep 30 '24 21:09 BPDanek

Is there a better failure case that can be replicated? EntityTypes should only exist for Forms and Tables and I am wondering if this path is being reach incorrectly. Your specific change is in the context of a table, does the page in question have a table or should this code path not even be reachable in your test scenario.

Within a table I believe EntityTypes should always be populated. There is probably another issue at hand going on in your test case.

andrewkowalik avatar Oct 07 '24 20:10 andrewkowalik

+1 to what @andrewkowalik wrote, EntityTypes should always exist in table output and is returned by the Textract Tables API since ~2 years ago. If you are processing older responses I would advise simply updating the response themselves to include EntityType but I don't see a need to support this in mainline.

Let me know if you have an example of a recent response that does not have the EntityType field as that would be a Textract bug. Thanks!

Belval avatar Nov 06 '24 16:11 Belval