Fix handling of empty lists and malformed PDF dictionary values

Open pbottine opened this issue 1 month ago • 0 comments

This fixes issue #12 where certain malformed PDFs would cause "List index out of range" errors during parsing.

Changes to PDFList.load():

Changes to parse_object() dictionary handling:

Fix logic ordering: check isinstance(value, list) before checking emptiness This prevents skipping falsy but valid values like 0, False, or empty strings
Gracefully skip empty lists in dictionaries (log debug message)
Catch ValueError from PDFList.load() for truly malformed lists
Log warnings instead of raising exceptions for unexpected values
Continue parsing to extract maximum data from malformed PDFs
Update dictionary to keep it self-consistent after wrapping lists

Testing:

This builds on the approach from PR #3426 by @mrscottyrose with corrections to logic ordering, offset calculation, and comprehensive test coverage.

Fixes #12

cc: @smoelius

🤖 Generated with Claude Code

Nov 26 '25 20:11 pbottine