HummusJSSamples text-extraction thwarted by inline images

text-extraction thwarted by inline images

Open catagras opened this issue 6 years ago • 3 comments

Played a bit with text-extraction sample and found that if an inline image is encountered (a BI / ID / EI construct) the rest of the page is skipped. Most likely this is happening because the image stream that follows ID is parsed as a PDF token not as a stream.

Any hint on how I might skip inline images?

Thanks!

May 23 '18 04:05 catagras

HummusJSSamples HummusJSSamples copied to clipboard

text-extraction thwarted by inline images

HummusJSSamples
HummusJSSamples copied to clipboard