Feature: Warn on unicode decoding errors in PDF annotations
In certain scenarios, annotations may include invalid or extraneous data that can obstruct the annotation processing workflow. To mitigate this, the warn_unicode_error parameter in the PDF initializer and the .open() method provides a configurable option to bypass these errors and generate warnings instead, ensuring smoother handling of such anomalies.
Example warning, if activated:
UserWarning: Could not decode contents for annotation. Annotation contents will be missing.
Hi @stolarczyk, and thank you for this suggestion. It makes sense to provide such warnings, although I'd lean toward a more generalizable approach rather than specifying parameters for each type of warning. To that end, I'm more inclined to use Python's built-in warning filtering. I'm open to other opinions, though. What do you think?
thanks for looking into this, @jsvine!
I’m not entirely clear on how your idea regarding built-in warning filtering would address the issue I'm focusing on. The proposed change turns an exception (UnicodeDecodeError) into a warning, which prevents the PDF processing from crashing entirely. So the warning is the result, not the issue at hand.
My apologies for the misunderstanding! I think the name of the proposed parameter threw me off, but I also should have looked more closely. I think I understand it now. This proposal makes sense, though what about tweaking the name?: raise_unicode_errors=True/False?
Thanks for the suggestion. Just renamed it.
Thanks, @stolarczyk — I've pushed a small tweak, above, so that the linter is happy. But looks like we're missing a bit of test coverage:
@jsvine, the test has been added.
Thanks! Merged into develop