Greg Lindahl
Greg Lindahl
Thank you for the excellent bug report, with the pywb version dependence. I can't access the slack warc file because my org isn't a member. I only have guest access.
I see Ilya has been assigned by Tessa and I know he does have access to the IIPC Slack. So it's in good hands.
It's not a big deal to regenerate **all** the epubs -- and I'm looking forward to seeing how many books are going to cause print('Unexpected page type: ' + page_type)...
@oddhack2 The header/footer code is in common.par_is_pageno_header_footer(), which you can open a separate issue for. It's something you can debug directly with the _abbyy.gz files that you can download, since...
Hm, having asserted this could be done, I tried it, and it's more complicated than it looks. If I get it working before I finish this bottle of wine, I'll...
I agree this is a bit frustrating, but it is what it is. Using the command-line tools, if you '-collection:printdisabled' in your search string, that will keep your search to...
OK, so my search of 1000 books for an americana not-printdisabled book that has an un-handled abbyy page type came up empty. I've got some other downloads running while I...
Examples of type "chapter" Unexpected page type: chapter in book advancedalgebras00senk Unexpected page type: chapter in book americanbar1979jcfi Unexpected page type: chapter in book americanbar1986jcfi Unexpected page type: chapter in...
Looks to me like this is going to be solved internally @ the archive. If you look in _scandata.xml, the addToAccessFormats true/false field was intended to be used to select...
Tom, as I mentioned above, this has been fixed a different way in our internal repo already, using the future-proofed addToAccessFormats field. That fix has now been QAed and at...