typst Support EPUB output

I with different device formats and screen sizes, various aspect ratios, etc. Having a format that's more dynamic and catered towards reflowability is really important. PDFs are nice, but they always have a fixed size and don't work well on many devices people use now.

It would be a good step for a modern latex alternative to take a step forward to supporting it. I always hate it that you have to have 2 workflows. 1 for PDF with latex and then a separate one with whatever tool you prefer for epub and it would be amazing to combine it.

It was previosly mentioned in the comments to the issue about HTML output but I think EPUB itself is important enough to make a separate issue for since there's still quite a bit on top of just HTML.

Mar 23 '23 05:03 AliciaBytes

c.f. #114

Why should ePub output be the responsibility of a typesetting engine? I don't think making it a format conversion jacknife is a good plan: better to leave that to Pandoc or other tools and save the typesetting engine for output formats where you actually have the control to implement deterministic layout.

Mar 23 '23 06:03 alerque

c.f. #114

Why should ePub output be the responsibility of a typesetting engine? I don't think making it a format conversion jacknife is a good plan: better to leave that to Pandoc or other tools and save the typesetting engine for output formats where you actually have the control to implement deterministic layout.

EPUB is just a ZIP file containing HTML and some other metadata files. I personally would be ok with only supporting HTML output, and then preparing the ZIP file externally with other tools, but I think that there should be a way to obtain an EPUB from a Typst document.

The thing is that both Typst and LaTeX conflate generating a specific layout, and generating content.

When converting to HTML we might have to forgo the layout, but the content can still be usable. In principle, there is no reason why \emph{emphasized} or _emphasized_ could not be converted to emphasized.

Pandoc is not a solution in this case, because it would have to take as input the output of Typst, and if Typst cannot output to HTML then it would have to be PDF, and Pandoc does not support PDF as input. In general, PDF is a sink format: content cannot be (reliably) extracted from PDF files.

It really would be the same issue as with LaTeX. Pandoc can take trivial LaTeX as input. If your document only contains \section{}, \emph{} and maybe few specific commands such as \cite{}, then Pandoc can parse them. But as soon as you have custom commands, Pandoc will be useless, because it does not implement a complete LaTeX (nor Typst) engine.

Take the Fibonacci example from Typst readme. The table is generated through Typst scripting engine (and something similar can easily be done in LaTeX, too). The output is a table, which is perfectly representable in HTML (and most other Pandoc output formats), but Pandoc would not be able to run that script.

Or take LaTeX's \printbibliography. The output is just formatted text, but Pandoc cannot compute it (well, it actually can, but only because its parser looks for that specific command and implements it internally).

Mar 23 '23 07:03 claudiomattera

EPUB is just a ZIP file containing HTML and some other metadata files. I personally would be ok with only supporting HTML output

I know. I have the same issue with HTML, I don't think a typesetter should be responsible for outputting HTML.

When converting to HTML we might have to forgo the layout, but the content can still be usable. In principle, there is no reason why \emph{emphasized} or _emphasized_ could not be converted to \<em\>emphasized\</em\>.

Sure there is. When the commands / functions / etc. used to typeset are configurable they may or may not be used to correspond to , any attempt to map them is a best-guess scenario.

Pandoc is not a solution in this case, because it would have to take as input the output of Typst

No, it could take as input the input for Typist, not the output. We're talking about document format conversion of the input format here, not understanding typeset output.

I understand what you are saying about the pitfalls here. I have tons of LaTeX experiences and am the maintainer of SILE, an alternative to Typst. SILE does implement a plain text output mechanism to extract text (including programmatically generated strings) that would have been output, but this does not scale well to mapping to HTML or some other markup format. At the end of the day our recommendation for SILE is that you use an input document format such as Markdown or XML, and use SILE to typeset it but some other program to generate an ePub from the same sources. I don't understand why people seem to think Typst could/should be fundamentally different.

Mar 23 '23 09:03 alerque

I don't understand why people seem to think Typst could/should be fundamentally different.

My background is in retrofitting a typesetting system (TeX) to generate HTML. Any new project that aims to innovate beyond the status quo paradigm has the option to ambitiously emit both PDF and HTML, and make internal provisions in all places where there is a conflict/impedance mismatch between the formats.

Judging by @laurmaedje's thesis "Typst: A Programmable Markup Language for Typesetting", they're well aware of the details here, and their "content model" layer is well-positioned to also emit structural representations.

Otherwise you'll end up in the same place where TeX/LaTeX are, needing custom third party projects to retrofit some unreliable subset of the syntax to HTML (or relying on PDF.js, if that's your cup of tea).

So I'll very much encourage the Typst team to look into a native HTML output, or - more generically - making enough provisions in the Typst kernel to bookkeep and emit structural markup for near-arbitrary content. That kind of capability should also separately aid emitting Tagged PDF, where appropriate.

Mar 23 '23 13:03 dginev

Sure there is. When the commands / functions / etc. used to typeset are configurable they may or may not be used to correspond to , any attempt to map them is a best-guess scenario.

Yes, in LaTeX you can redefine \emph{} to do something different than italic text, and I suppose in Typst you can redefine underscores, too. But that is just the input.

At some point, Typst will output some italic text in the PDF. Maybe because the input was between underscores, or because it was between redefined stars, or because it was manipulated in other way by functions. The reason does not really matter, what it matters is that it would output some italic text. At that point, Typst could output  instead.

No, it could take as input the input for Typist, not the output. We're talking about document format conversion of the input format here, not understanding typeset output.

I am not talking about document format conversion. I am talking about generating an HTML file (and ultimately EPUB) from Typst, the same way as Typst generates a PDF file. I.e. treating both HTML and PDF as first-class output formats.

If you take as input the input for Typst, then you need to implement all Typst scripting engine to actually generate the output. Typst obviously already implements the Typst scripting engine. Pandoc does not. What would Pandoc do when it encounters something like this (taken from Typst readme as a showcase for what can be done)?

#let count = 8
#let nums = range(1, count + 1)
#let fib(n) = (
  if n < 2 { 1 }
  else { fib(n - 1) + fib(n - 2) }
)

The first #count of the sequence are:

#align(center, table(
  columns: count,
  ..nums.map(n => $F_#n$),
  ..nums.map(n => str(fib(n))),
))

The only two ways it can work are:

Reimplementing the whole Typst scripting engine in Pandoc. Besides being infeasible, why would we need Typst in the first place?
Reimplementing a small subset of Typst scripting engine in Pandoc. This is what Pandoc does with LaTeX.

This effectively means that you need to give up many of LaTeX features when feeding it as input to Pandoc (case in point: most LaTeX packages do not work in Pandoc).

It would be the same with Typst. Not even the simple Fibonacci example might be supported.

this does not scale well to mapping to HTML or some other markup format

Sure, there are many LaTeX documents that finely manipulate the structure of the output in a way that is not easily representable in HTML. I use LaTeX to print bottle labels, and I have used to print posters in the past, and I suppose I can use Typst in the same way. I use LaTeX even to produce standalone charts with Tikz and Pgfplots, that later I convert to SVG and use elsewhere. I do not expect to get anything usable by converting those kind of documents to HTML.

A lot of LaTeX documents' output, on the other hand, is essentially formatted text, with maybe some floating figure/table/listing, and some maths. Most books and thesis are like that, barring maybe the title page and other page decorations. Those are perfectly representable in HTML.

I can take a typical book typeset in LaTeX or Typst, and copy and paste from the PDF, manually fixing the text format, and obtain an HTML document with the same content of the PDF. It is a very time-consuming and error-prone process, but it is mostly mechanical. There are tools that can automate it, but they work poorly because extracting formatted text from PDF is complicate: you lose all semantic, you can just extract letters, positions and fonts. Are those lines different paragraphs, or are they cells of a borderless table? Can you tell the difference just by looking at the PDF?

If instead we could directly output to HTML from whatever intermediate representation is used by Typst, it would be much more effective.

At the end of the day our recommendation for SILE is that you use an input document format such as Markdown or XML, and use SILE to typeset it but some other program to generate an ePub from the same sources. I don't understand why people seem to think Typst could/should be fundamentally different.

But then what would even be the appeal of using Typst?

If you use Typst only as an intermediate format, you would not be using any of its scripting capabilities, and it does not seem to me that Typst markets itself just as a typesetter. You would have to use whatever is available in your source format (for Markdown I suppose you would use some other language's template engine). Since Typst already showcases its scripting capabilities in the very readme, I assume that those are considered a significant feature.

Same as LaTeX, again. Why would I use LaTeX if, instead of using biblatex, or cleveref, or glossaries, I were forced to manually typeset the bibliography, complex intra-text references, or acronyms?

Rephrasing: how would you suggest to handle, say, acronyms when you are using Typst only as an intermediate format, and having your source as Markdown?

I think your suggestion severely restrict the scope and usefulness of Typst (or LaTeX, samewise).

Just to be clear, I fully understand that this might be a lot of work and, while a bit disappointed, I would understand if the response were "sorry, out of scope / too much work / not possible due to design decisions".

I also think that LaTeX inability to generate EPUBs is a severe shortage in 2023, and alternatives should aim to overcome that inability.

LaTeX might have a good excuse due to being ancient; it was designed when books could only be consumed on physical paper, or later on computer screen. If you think about it, a PDF is equivalent to an BMP image. Sure, vectorial, stored and displayed more efficiently, but we consume PDF files as images: by looking at them (even copy and paste works horribly).

Typst, on the other hand, is entering the world at a time when eBook readers have existed for a while, and are becoming more and more popular over time. And those do not work well with PDFs or images, they need to be able to dynamically reflow the content.

EDIT: Wow, perhaps I wrote too much :D I guess that all my fighting with LaTeX to obtain something usable in an eBook reader made me emotional about alternatives supporting EPUBs as a first-class output format.

Mar 23 '23 16:03 claudiomattera

The lack of reflow ability of PDF document text severely limits that format's readability on mobile platforms.

Mar 23 '23 19:03 josb

I created a draft PR which (indirectly) adds ePUB output: https://github.com/typst/typst/pull/461

Mar 30 '23 18:03 phiresky

HTML and EPUB output is planned, but will take some time. :)

Apr 02 '23 12:04 laurmaedje

@laurmaedje one year has passed. Are there any updates on this? I'm really looking forward to being able to export to epub.

Apr 27 '24 11:04 ruipires

@laurmaedje one year has passed. Are there any updates on this? I'm really looking forward to being able to export to epub.

Don't get me wrong, I'm happy to help in anyway I can. Is there any ongoing work for this ? Anything at all I can try pitch in ?

Apr 28 '24 11:04 ruipires

Don't get me wrong, I'm happy to help in anyway I can. Is there any ongoing work for this ? Anything at all I can try pitch in ?

I believe the work on HTML output lays a lot of the groundwork for EPUB output and seems like the priority anyway. So I imagine that's where you'd want to look / help out. I'd look at some of the linked issues in the roadmap for HTML output blockers https://github.com/typst/typst/issues/712

EDIT: Up to date roadmap here (without specific tracking links to work needed though): https://typst.app/docs/roadmap

Apr 29 '24 10:04 dunxen

typst typst copied to clipboard

Support EPUB output

typst
typst copied to clipboard