Ignore pdf footer while reading
Hi,
I was earlier using version 2.0.0 where the footer text was automatically getting ignored. I recently upgraded to version 2.4.0. Now while reading the PDF I'm getting the footer text as well. I could not find any mention about it in the documentation.
Is there any way flag that can be used to ignore the footer text?
Thanks for reporting this issue.
pdf-reader has never intentionally skipped content on a page, and nothing between 2.0.0 and 2.4.0 has changed that.
I guess it's possible one of the bug fixes in those versions means some text that was accidentally skipped is now being extracted?
Are you able to test the intermediate versions (2.1.0, 2.2.0, 2.2.1, 2.3.0) to pinpoint the extact version where you see the behaviour change?
Is there any way flag that can be used to ignore the footer text?
Unfortunately, no. I'm not opposed to adding ways to target or ignore parts of a page, but for now there's no option to do it.
I guess it's possible one of the bug fixes in those versions means some text that was accidentally skipped is now being extracted?
Yes, i too think so.
I found the change since version 2.1.0
Unfortunately, no. I'm not opposed to adding ways to target or ignore parts of a page, but for now there's no option to do it.
Okay, thank, that answers my query. I'll see if I can find another way to identify footer text.
Given the changes starts in 2.1.0, I'd guess it might be a result of this commit a8ca5dc
I'm going through the code to add a flag for footer. Can we only ignore the last line of every page if say "ignore_footer" flag was present? Will I have to consider any other scenarios? Eg. footer can be of two lines, no footer present but the flag was set