mdBook
mdBook copied to clipboard
Support ebooks and pdf export
Gitbook supports export to ebooks and pdfs via calibre. This might be easy to hook into.
See also https://github.com/rust-lang/rust-by-example/issues/684 for problems this implementation creates for rustbyexample.
I would like to support pdf and ebook format. I think this could already be developed out of tree if you use the Renderer trait from mdBook.
I am not sure I want to depend on a full blown Gui tool though. There must surely be a better alternative for that.
Not familiar with many conversion tools like this. Pandoc also seems like a plausible option. Don't know of any others.
Yeah pandoc seems a lot better!
Did some exploration on this and seems doable. Here's the default epub version of the Rust book. Note the chapters out of order and links not working.
To get good output, I think we would need to:
- parse the ToC to get the list of md files, in the right order
- concat and transform the markdown files, replacing file links with internal links
- match the themes with epub versions of the styles
I'm interested in working on this but will be a bit slow.
Useful info here: Pandoc commands and styling options
- parse the ToC to get the list of md files, in the right order
- concat and transform the markdown files, replacing file links with internal links
@asolove, I have implemented this (among other transformations) in https://github.com/killercup/trpl-ebook, feel free to use my code.
@killercup great, thanks!
Great! Thanks for doing this :)
parse the ToC to get the list of md files, in the right order
This is already done in the Rust code, the MDBook struct can be iterated on. If you make a new Renderer you have access to that.
concat and transform the markdown files, replacing file links with internal links
Concatenating the markdown files is also not that hard, I do it for the print page.
Replacing the links could be a little trickier, what should internal links look like for pandoc? I know that pulldown-cmark gives you the ability to transform the parsed markdown events before rendering, but it's not well documented. Maybe link replacing is in it's capabilities.
Static files, like images, will probably also need some special treatment to be included correctly?
I'm interested in working on this but will be a bit slow.
That is absolutely no problem, there is no rush. ~~I will assign this issue to you so that others can see you are working on it.~~ (can't assign you). If you need any help, feel free to ask here :)
I am also planning on doing a big refactor (#90) to clean up and create a better API. For example, I am thinking about adding a way to poll the MDBook struct for specific chapters, etc. This would make it a lot more flexible for Renderers and if I end up doing something like #93. If you have suggestions or requests that might be relevant, post them in #90 so that I / we can brainstorm and come up with a good design :)
Replacing the links could be a little trickier, what should internal links look like for pandoc?
FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain and make reference link names unique for the combined markdown file.
FIY, I'm doing some regex work to transform links relative to the doc.rust-lang.org domain
let cross_section_link = Regex::new(r"]\((?P<file>[\w-_]+)\.html\)").unwrap(); output = cross_section_link.replace_all(&output, r"](#sec--$file)"); let cross_section_ref = Regex::new(r"(?m)^\[(?P<id>.+)\]:\s(?P<file>[^:^/]+)\.html$").unwrap(); output = cross_section_ref.replace_all(&output, r"[$id]: #sec--$file"); let cross_subsection_link = Regex::new(r"]\((?P<file>[\w-_]+)\.html#(?P<subsection>[\w-_]+)\)").unwrap(); output = cross_subsection_link.replace_all(&output, r"](#$subsection)"); let cross_subsection_ref = Regex::new(r"(?m)^\[(?P<id>.+)\]:\s(?P<file>[^:^/]+)\.html#(?P<subsection>[\w-_]+)$").unwrap(); output = cross_subsection_ref.replace_all(&output, r"[$id]: #$subsection");
Thanks! Does pandoc auto-generate the anchors from the markdown files in those formats? like #sec--$file? Or is that also handled by your code?
@azerupi I'm pretty sure pandoc generates those. I've had problems before because pandoc generates slugs in a different way than rustdoc.
It should be possible to add a specific id to each header, though. The syntax is # Header Name {#header-name} IIRC.
You might also want to look at adjust_header_level.rs and adjust_reference_names.rs.
Ok thanks for all the information, this will probably help @asolove a lot! :)
Not sure if this will help you guys, but I've created a simple rust tool which will collate multiple markdown files into one, resolving internal links and turning them into anchor links
We can use this in a pipeline on the way to converting to PDF:
mdcollate book-example/src/SUMMARY.md | pulldown-cmark > test.html && wkhtmltopdf test.html test.pdf
Code can be found here: https://github.com/cetra3/mdcollate
Happy to accept any PRs
@cetra3 That is really cool!
The plan is to make a "renderer" that does everything so that it can be used with the mdbook build command. So using a command line tool adds some complications. Have you thought about exposing the functionality as a crate?
I am not sure I would add a dependency just for that functionality, because there is always the possibility that it will not be maintained actively. But it could be considered if it offers enough useful methods that we wouldn't have to reinvent here.
I'm also sceptical about Calibre. We use it in Russian translation of TRPL and we've come along several problems with EPUB (links are to descriptions in Russian, for reference):
- EPUB isn't displayed correctly and has broken links
- EPUB has question marks instead of characters
- EPUB has duplicated code blocks
- I also vaguely remember we had to hack styles in order to get better PDF. Not sure if it's necessary or not with Pandoc
Thanks for sharing your experience :) We will see if pandoc has the same problems, but I think @killercup used it without too much / any problems?
I also vaguely remember we had to hack styles in order to get better PDF. Not sure if it's necessary or not with Pandoc
I am not sure how this is handled with Pandoc, but having a custom theme could be a good thing.
It's probably possible to wrap up those command line tools into a combined tool or expose it as a rust library. The last component (html to pdf) would need to use FFI as wkhtmltopdf is written in C. Not sure whether this adds too much dependency on externalities though.
The complication arises in that markdown is a superset of HTML which means that you need something that can present HTML in a printable fashion. In my experience with this problem, Pandoc and Calibre will do a subset, but you won't get full parity.
There are a few things to be aware of, but in general pandoc is really amazing at converting Markdown to LaTeX. Which is what you want, I think—it has some very nice features that you currently can't get with HTML-to-PDF converters. For example, my PDF versions of the Rust Book include cross-references like "This is a mutable variable binding (section 5, page 163)".
If you're no LaTeX wizard (I'm not), you might want to look at this template I threw together.
If you have any issues with this, just mention me.
Thanks for all your help Pascal! I will definitely look at what you have currently running and I am pretty sure we will end up stealing a lot of your code (if that is ok with you) :wink:
+1 for the effort, I am looking forward to using mdbook to produce ebooks.
It seems to have stalled a bit, is anyone currently working on this?
It seems to have stalled a bit, is anyone currently working on this?
Indeed, it has stalled a bit. In the last 6 months I have been overwhelmed with work at school :confused:
I am (very) slowly working on the refactoring / clean-up that I wanted to do. And that work is probably going to change the way this specific feature is going to be implemented. Hopefully I will have some time in September to make significant progress on the internal rewrite so that I can work on new features again.
@azerupi How much space is there for discussing this feature? There are some specific things I would be looking for in a CLI ebook helper, but maybe you are already determined in which way to go.
Some time ago I wrote prophecy, a ruby gem to automate the tasks I needed when producing ebooks. This is and example of the output. It has been very useful for me, but I believe I am the only user :)
I have been wanting to rewrite it with some of the hindsight since its early days, but when I saw this I thought maybe mdbook would be able to produce the same results.
There is an asciinema recording to see to sort of things it does.
I'm open to all ideas :)
Thanks. I will gather my thoughts and post a longer comment on what ideas worked.
When I was building prophecy, it was important to be able to:
- A) quickly build a well-designed ebook from minimal input (i.e. just the manuscript in markdown and the TOC sequence)
- B) but let the lib read settings in the book's folder to influence structural behaviour on a per-book basis,
- C) have access to the attributes of the book and the chapters in the manuscript files (such as the ERB-style
<%= chapter.title %>or inserting the ISBN number with<%= book.isbn_ebook %>) - D) add custom CSS with
@font-faceembedded fonts, - E) build the EPUB and MOBI with different settings,
- F) generate valid ebooks, trigger no warnings in Sigil's validator
- G) generate both the toc.ncx (for the TOC menu) and an HTML Contents page which the reader sees after the title page
I thought it was going to be an insane lot of config options, but after a while it started to be sufficient for any book, and this much was enough.
I should add that LaTeX is mentioned a lot in the sources, but I ended up avoiding to generate LaTeX content. Now this might be a different experience if you are not so picky about every paragraph.
I produce books for printing, and the LaTeX files have the most specific hair-trigger accurate tweaks and custom macros, so usually I produce the LaTeX sources first, and convert to markdown from there for the ebooks.
The book's config files were in .yml format, and their option keys would get
overwritten from the general towards the specific.
book.yml- general info (title, author, ISBN)epub_mobi.yml- shared data for EPUB and MOBI (chapter list for TOC, language attr, etc)epub.yml- only for epub (such as excluding assets which only go to the mobi)mobi.yml- only for mobi
For F), I wanted to be able to track content changes with a diff tool, but
Sigil had a habit of renaming the folders. Check out the epub template
that worked eventually. The lib also copies Fonts, Images and Styles from
assets into the OEBPS folder. See the book Travessia for example.
(btw I lost the habit of committing the generated epub source files. It is useful to be able to inspect them when working on a book but committing it was overkill.)
Although the EPUB format is liberal about folder names for images, fonts, etc., I found that when other people made small corrections in the ebook with Sigil, it would rename the folders without asking. The above template avoids this.
For the design in A), three stylesheets were necessary, because you
- want to make it pretty in the EPUB for iPads and such, but
- for the Kindle screen (Paperwhite and newer) you need to tone it down or optimize for contrast and readability, and
- you want to support old Kindles which don't comprehend
@font-faceCSS.
You can see in page.xhtml.erb that it would select different stylesheets depending on whether it was building for EPUB or MOBI.
The new Kindles use the KF8 format, and you can support the legacy format with
media="amzn-mobi" media query, while the new Kindles will take the
media="amzn-kf8".
To generate MOBI, the best practice seems to be to build it as an EPUB, then run Amazon's Kindlegen CLI tool on it, to produce the MOBI.
Now Kindlegen has the strange logic of including the source EPUB in the resulting mobi file, and so your output will be double in size.
The response to this was the kindlestrip.py script by Paul Durrant, which strips this out from the mobi, see more on what it does in the comments in its header.
Back to prophecy, the gem had the stylesheets in the user's gemdir, so the behaviour was:
- initialize a new book without the CSS
- copy from the gemdir when building the ebook files
- allow customizing with a command
prophecy assets, which made a copy of the assets to the book's folder - if local assets were found, the lib would compile and use those instead of those in the gemdir.
Some of this wasn't so good. It turns out that almost all ebooks needed at least a little tweak in either the CSS, the typefaces, etc., and I got into the habit of always including the assets with the book.
This is probably enough for now. Phew, I hope all the links work! :)
The details looks a bit scary, but much of the complexity is dealt with in the page templates.
Otherwise it is just this much:
- read in settings
- use local assets when provided
- copy assets to the right folder
- render the ebook data files
- render chapters to HTML
- zip to EPUB
- plus kindlegen, kindlestrip when doing MOBI
I would like to contribute code too, but I would need guidance. It has not been long since I read my first Rust tutorial :) I would much rather add to a well-designed outline where I can.
@gambhiro Thanks for all info, I glanced over it and will certainly take a deeper look when the time comes to write the pdf / epub renderer!
In the mean time, I updated the issue for the refactoring I want to do. It outlines at a high level what I think should happen. This should unblock a lot of the most wanted features.
I put the points in the order I think they should be implemented, of course this order doesn't need to be followed strictly.
At this point I think it's important to discuss and iterate over some designs. So if you want to help out I suggest you take a look and start a discussion in the area that interests you :smiley:
There are a couple of issues that can be implemented directly too, like:
- Switch to the log crate
- Switch to Serde
- Replace JS and CSS dependencies with their equivalent from npm, to allow more regular updates
I will happily mentor and answer any questions you have, don't be afraid to ask.
Thanks. I'll level up :)
Crowbook just released an update and I wasn't aware of it at all as a publisher to epub/pdf. Might be worth considering as a renderer for these formats.
Yes, she is doing very diligent work on that project. I have been working on the ebooks for the past few days, and first thing I did was to read what she wrote in crowbook for this.
It parses the book's files to a data structure, selects the output format from defaults or what the user said on the cli, and gives the data to a function that will do whatever it needs to produce the output files.
Pretty much the obvious thing huh? Basically the task of rendering some markdown files to different formats is a simple and well-defined problem in its nature, so there won't be a lot of different solutions that make sense. It's a lot like the static-site generator idea as well.
One pleasantly surprising thing which @lise-henry does there though, is that she parses the documents into an "Abstract Syntax Language" tree (basically a tree of tokens down to paragraph, link, emphasis) and the output functions iterate over that data. This, instead of the output function reading in the markdown file for its own purposes and doing something with that.
That token tree allows the content to be processed in terms of writing prose or typographical customs, and so she wrote a module that lets crowbook have opinions about the grammar in the text or correct punctuation. She's a novel writer so it makes sense she would be interested in an automated sanity check which the machine can do.
I remember thinking, "hm, why don't I just use crowbook?" and the only convincing point I could come up with is that I like writing the code :) Which is pretty much why she started writing crowbook as well. I think we like to be at that place, writing code to solve a problem we understand.
There is this TED talk by Uber's founder Travis Kalanick, where he is asked why is he doing this. He says, "... the way I like to describe it is it's kind of like a math professor. You know? If a math professor doesn't have hard problems to solve, that's a really sad math professor."
Have you tried to write a basic but efficient primality test? I think you would find it satisfying!
Best wishes for the new year!