rust-wildbow-scraper icon indicating copy to clipboard operation
rust-wildbow-scraper copied to clipboard

The generated epub is invalid

Open mrothbart opened this issue 4 years ago • 5 comments

I tried imported the epub created to google play books and it fails. When I uploaded it to https://www.ebookit.com/tools/bp/Bo/eBookIt/epub-validator it cam back with an incredible number of errors.

Additionally, is there any chance you could add support for some more serials? Like The gods are bastards or a practical guide to evil.

mrothbart avatar Oct 27 '19 18:10 mrothbart

It looks like most of the errors from there are about invalid id tags in the content file. That comes from the epub generation library I'm using as it starts those ids with hyphens, which is technically not allowed. However, none of them should be affecting your ability to read the epub, at least not in most readers. I just double-checked with a fresh epub on Calibre, Google Play Books, and my personal emacs epub reader, and they all worked. Are you sure that the epub is unable to be read by Google Play Books? I just used the web version, but it's possible the Android implementation is more finicky. As far as adding support for more serials goes, it's totally possible but right now the code is definitely structured specifically for the wordpress format Wildbow uses. I'd want to restructure the code entirely to allow for a separation of book-specific scraping and more general code. I'll see about working on that, though.

nicohman avatar Oct 27 '19 20:10 nicohman

My E-Book refuses to open built books saying that epub might be corrupted or protected. If it helps somehow, I tried opening on PocketBook 614. UPD: Happens on NOOK as well.

MxDeWitt avatar Dec 31 '19 13:12 MxDeWitt

I am also having this problem. The epub is not showing up on my Kindle Paperwhite. My generated epub shows a lot of those id errors on the epub-validator as well.

Lupinicus avatar Feb 18 '20 21:02 Lupinicus

I also had this issue and did some digging to see what I could figure out.

Reproduction:

  1. clone repo, run cargo, run scrapper.
  2. Attempt to open with Adobe Digital Editions (it's all I had on my machine. Failed, epub invalid.)
  3. Attempted to open on Kobo. Failed, complained about DRM for some reason (bad error message I presume).
  4. Opened on Android phone via Google Play Books. Worked just fine.

From there I opened up the epub file with 7zip and compared it against a known working epub. Found that the toc.ncx file was missing <!DOCTYPE ncx>. Added this line in as the second line of toc.ncx and rezipped the epub. It now works in both desktop Adobe Digital Editions and on my Kobo.

Perhaps adding some post processing on the epub to add that line in would fix the compatibility challenges? I'd throw some code together but it's late and I kinda just want to sit down and start reading Ward.

Hope this helps!

jesseDtucker avatar Jun 08 '20 07:06 jesseDtucker

I don't know what the issue is exactly (it might be the same as lise-henry/epub-builder#10) but I had to unzip the generated epub and rezip it manually for it to work with my pocketbook ebook reader.

stefan0xC avatar Aug 30 '20 16:08 stefan0xC