Jack O'Sullivan
                                            Jack O'Sullivan
                                        
                                    Hi @ross-spencer, I think we've hit this issue with a PDF that is generating 1.3GB of jhove xml output. As you have seen, there seems to be a "pages *...
I have seen similar behaviour (seemingly infinite spinning whilst analysing) on the attached PDF. [fulltext.pdf](https://github.com/openpreserve/jhove/files/4594193/fulltext.pdf)
I just had a look at that block of code for the file I attached. I don't see anything wrong with the pattern itself. It seems to get itself into...
Also wondering, given the stack trace and description, whether #306 is related.
@david-russo I wouldn't expect that an escaped new-line character to be written in the middle of another character, which is what you seem to be suggesting here? That sounds like...
I've forwarded the requested bits of the file by email, let me know if you need anything else. Thanks, Jack
There are well established index formats for WARCs that do what you're describing of collecting offsets for various pieces of content, and which are the basis of how the [wayback...
OK, great, was just a request for clarification then!