pdfparser Ignore Form as well as Image XObjects when assembling the text array for a PDFObject.

Fix for #782

Oct 30 '25 11:10 rupertj

Thank you for your PR.

Is it still work in progress?

If not, there are a few tasks left to solve before I take a closer look. Please read https://github.com/smalot/pdfparser/blob/master/CONTRIBUTING.md for more information.

Nov 03 '25 07:11 k00ni

Thanks for the reminder @k00ni. I've added test coverage for the change.

Nov 03 '25 09:11 rupertj

That change from "Imo" to "Im0" was just correcting a typo in the existing test. I didn't spot that I got that wrong when I wrote it.

I could revert that line and submit it as a separate PR if you like? I think keeping the new test coverage in the same method as the existing coverage makes sense, as they're testing the same bit of code.

Nov 07 '25 10:11 rupertj

Also, to clarify: when the command in the test data is "/Imo Do", the test passes, but for the wrong reason. We're checking for no result for that XObject, and we get no result because it can't find an object called Imo.

When the command is "/Im0 Do", we still get no result, but we're getting it for the right reason. The code finds the XObject, sees that it's an image and then decides not to include it in the text array.

Nov 07 '25 10:11 rupertj

Sorry for the delayed response.

I follow your arguments, it looks good to me. The documentation provided in #782 was very helpful.

Nov 24 '25 07:11 k00ni

Thankyou!

Nov 24 '25 09:11 rupertj