docs.scala-lang
docs.scala-lang copied to clipboard
Automatically generate PDF, ePub, and MOBI versions of Scala Book
This is the start of some notes on how to automatically create PDF, ePub, and MOBI versions of Scala Book. I have already generated first versions of these documents with a manual process (excluding a couple of problems noted below), and this is a writeup of how to automate the process.
To create an ePub document
- Copy all markdown files and LIST_OF_FILES_IN_ORDER from the website directory (_overviews/scala-book) to a working directory
- Add a
#
title tag to each *.md file- get the title from the header section
- prepend a chapter number to each title
- Remove the header content from all *.md files
- Transform all
<pre>
sections to use only four backticks- all “fenced code blocks” need to be transformed (scala, java, etc.) to use four backticks
- actually, I know this is necessary for the PDF, but it might not be necessary for the ePub and MOBI versions
- Generate a Pandoc command that includes all *.md files in the proper order; this command looks like this:
pandoc -o ScalaBook.epub \
metadata.txt \
working/introduction.md \
working/prelude-taste-of-scala.md \
working/preliminaries.md \
50 more lines here ...
I have code to do all of that, I just need some free time to clean it up and automate it.
To create a MOBI document
There are more elaborate ways to do this, but at the moment this command seems to work, generating the MOBI document from the ePub:
kindlegen ScalaBook.epub
It looks like KindleGen is available for Linux, MacOS, and Windows:
To create a PDF document
I currently have a way to do this, but it will take a little time to automate. The first few steps in the process are similar to the ePub process, but you don’t need to add a chapter number:
- Copy all markdown files and LIST_OF_FILES_IN_ORDER from the website directory (_overviews/scala-book) to a working directory
- Add a
#
title tag to each *.md file- get the title from the header section
- Remove the header content from all *.md files
- Transform all
<pre>
sections to use only four backticks
After that the steps are:
- Convert all of the Markdown files to LaTeX files
- Generate the PDF using a
latexmk
command- Historically I have manually worked through any issues that come up at this time
Known problems
Generating the PDF
- LaTeX doesn’t like the trick I used with the “Prelude” title, so that has to be replaced
- The PDF-generating process gets stuck on a line somewhere near this:
'', k, v) val keys = m.keys val values = m.values val contains3 =
I haven’t had the time to look into that yet.
Generating the ePub document
- This process fails on the tables in the following files, so this needs to be looked into:
- built-in-types.md
- collections-101.md
- The
{::comment}
syntax shows up in the MOBI document, so it’s probably also in the ePub- I’ll submit a pull request to delete all comments from Scala Book
Tools
So far my “tools” for generating these documents are:
- Unix shell scripts, including
sed
commands - Some custom Scala scripts
- I wrote these to remove the Markdown header content, and add
#
title tags to the resulting Markdown files
- I wrote these to remove the Markdown header content, and add
-
Pandoc
- This is used to generate the ePub and MOBI versions
-
LaTeX
- This is used to generate the PDF
- I use a Mac, and I think I installed the tools (several years ago) with MacTeX
What I need
Mostly all I need to complete this process is some free time on my part, and then I just need to know that the tools listed will be available on the server. Assuming I can work through the problems listed, the whole process is really:
- Copy the website Markdown files to a working directory
- Transform their header sections
- Convert the *.md files to *.tex files, and generate the PDF with the
latexmk
command - Generate the ePub with the
pandoc
command - Generate the MOBI with the
kindlegen
command
I think the ePub and MOBI files can also use a stylesheet, so that’s something else to be looked into, but I’m more concerned about automating these processes at the moment.
I’m getting back into this process this weekend, and I just want to note that the main part of the ePub process is running this command after making some slight tweaks to the Markdown files:
pandoc -o ScalaBook.epub \
metadata.txt \
introduction.md \
prelude-taste-of-scala.md \
preliminaries.md \
... many more markdown filenames here ...
For that command, the metadata.txt file looks like this:
---
title: Scala Book
author: Alvin Alexander, et al
rights: Creative Commons Non-Commercial Share Alike 3.0
language: en-US
---
You can add more information to that file, as well as a stylesheet, as described in this medium.com article.
That pandoc
command currently has a problem with Markdown tables, which is what I’ll work on next. But once you have a working ePub file, you can then create a MOBI file with this command:
kindlegen ScalaBook.epub
The PDF-generating process is much more involved, but this process is pretty simple.
I put the initial tools for creating EPUB and MOBI versions here:
I don’t know how to get the website’s markdown files into the proper directory, but that should be a simple configuration change.
I’ll work on the PDF process next.
Any update on this? I'd love to read "A Taste of Scala 3" as a PDF on my remarkable. Right now this chapter is split in many sections (I can do "print" from Chrome for each section but it's a bit annoying; I'd rather have a nice clean PDF of the whole chapter, of even better of the whole book).
Note that it was less a problem for the Scala 2 book because "A Taste of Scala" was a single file: https://docs.scala-lang.org/overviews/scala-book/prelude-taste-of-scala.html so it was easy to print from Chrome as a PDF.
Any update on this?
Technically I can generate PDF and MOBI versions pretty quickly. I think we were just waiting to have the book reviewed to get it in better shape. Then I just need to take out the time to remember the process. :)