WebToEpub icon indicating copy to clipboard operation
WebToEpub copied to clipboard

gitbook parser request

Open gahoo opened this issue 4 years ago • 4 comments

Gitbook based online books are gaining popularity, is it possible to add a gitbook parser?

Here's an example site: http://bioconductor.org/books/release/OSCA/

gahoo avatar Dec 04 '20 01:12 gahoo

@gahoo It looks like it should be possible. I am starting to get snowed under with requests, so I'm going to suggest you try doing it yourself. How to:

  • https://dteviot.github.io/Projects/webToEpub_FAQ.html#write-parser
  • https://dteviot.github.io/Projects/webToEpub_CustomizingParserTemplate.html

If you get stuck, feel free to send me an email, or add a note to this issue.

Aside:

However, GitBooks is capable of generating an epub, as well as a web site. So, it would be better to ask sites to provide an epub to download. Or even better, add an option to GitBooks to generate an epub as part of the website content, and include a download link on the site. Hmm... looking at their docs https://docs.gitbook.com/features/pdf-export, they're already going that way.

dteviot avatar Dec 04 '20 03:12 dteviot

@gahoo A couple of notes, if you do try to do this.

  1. Because you want to use the parser based on page format, not site host, you'll need to do something like this to register the parser https://github.com/dteviot/WebToEpub/blob/34afa797fe8ae9b6ba0bbdd8854e7ee0ac9ad668/plugin/js/parsers/MadaraParser.js#L15-L21
  2. WebToEpub doesn't handle case of multiple entreis in the Table of Content pointing to the same web page. So you'll need to cull the URLs to sub-headings. (i.e. The ones with fragment identifier or hash '#' in your implementation of getChapterUrls()

dteviot avatar Dec 04 '20 08:12 dteviot

Thanks for your quick response and patience introduction.

Some Gitbook based online books was generated by bookdown. It might require installing related packages and other extra effort to build which might be time comsuming. So building epub from an online version directly is the fastest way.

The default parser works well except for the chapter with multiple hierarchical subsection which will break the pages into too many separated part leaving large white blank on the page. However, I still don't know how to handle this condition after reading Customizing the Template for a new Web Site.

Here is an example output epub file for your reference and it will be expired in 7 days.

gahoo avatar Dec 09 '20 04:12 gahoo

I figured it out myself. Remove the following codes from stylesheet, then everything works perfectly.

h1, h2 {
   text-align: center;
   page-break-before: always;
   margin-bottom: 10%;
   margin-top: 10%;
}
h3, h4, h5, h6 {
   text-align: center;
   margin-bottom: 15%;
   margin-top: 10%;
}

gahoo avatar Dec 09 '20 08:12 gahoo