Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

[Feature Request]: EPUB to PDF conversion needed

Open Greatz08 opened this issue 9 months ago • 32 comments

Feature Description

:-)

Thank you

Why is this feature valuable?

No response

Suggested Implementation

No response

Additional Information

No response

No Duplicate of the Feature

  • [x] I have verified that there are no existing features requests similar to my request.

Greatz08 avatar Mar 28 '25 03:03 Greatz08

Can I work on this?

0x-2FA avatar Apr 10 '25 23:04 0x-2FA

Absolutely!

Frooodle avatar Apr 10 '25 23:04 Frooodle

Hi @0x-2FA,

I wanted to check in and see if you’ve made any progress on this. This feature has recently become personally relevant to me, and I’m also willing to work on it. Of course, I don’t want to step on your toes if you’re already making good progress or are close to opening a PR.

Let me know where things stand :)

Thanks!

balazs-szucs avatar Jun 14 '25 10:06 balazs-szucs

@Balazs-Szucs Yes of course. I want that feature myself thats why I wanted to contribute. Unfortunately I didn't find the time in order to dig in. In fact I cloned the repo quite recently. You are more than welcome to take the task.

0x-2FA avatar Jun 16 '25 21:06 0x-2FA

Thanks for response @0x-2FA, no worries. Not urgent on my side. Thanks for your work :)

balazs-szucs avatar Jun 16 '25 22:06 balazs-szucs

Will look into it this week I finally have some free time

0x-2FA avatar Jul 24 '25 19:07 0x-2FA

I think I have it ready!

@Frooodle But before I do the finishing touches I need your help. Do you think it is better to have a separate tool in the Convert to PDF section or should I simply add it to the existing Convert file to PDF?

I think adding it to Convert file to PDF and updating the Supported Files is the best option imho.

0x-2FA avatar Jul 28 '25 16:07 0x-2FA

I was thinking separate 😂

In V2 coming out in few months we are combining everything But for now I'd want to have to separate to call it out as a new feature etc

Frooodle avatar Jul 28 '25 16:07 Frooodle

Sure, no problem at all. I’ll also add a link for it. Do you have any suggestions for the text label? I can’t think of a good name. I would assume Epub to PDF, but I’d like to hear your opinion too.

0x-2FA avatar Jul 28 '25 16:07 0x-2FA

Some people prefer book to pdf But I think epub is more explanatory and good for this stage

I'd have name Epub to pdf

Description Convert the book format epub to PDF

Frooodle avatar Jul 28 '25 16:07 Frooodle

Great on it!

Also for reference, I've tested all the epubs from here https://epubtest.org/test-books and all of them converted successfully to pdf. If you have any specific epub for testing, feel free to tell me.

0x-2FA avatar Jul 28 '25 17:07 0x-2FA

I'll do some testing once PR is up but otherwise I trust your testing!

Frooodle avatar Jul 28 '25 17:07 Frooodle

@0x-2FA Hows it going?

Frooodle avatar Aug 09 '25 14:08 Frooodle

@Frooodle I mostly finished it back then, but I noticed that there is an issue with calibre on Alpine.

The package doesn’t work on alpine:3.22.1. I can only get it to work on alpine:edge. It probably depends on some packages that 3.22.1 doesn’t have, which enable calibre to run.

One option might be adding glibc into Alpine so we can use the prebuilt binary, but I’m not a fan of that approach.

I’m also not keen on using alpine:edge, though I noticed we add the edge repos anyway. If I understand correctly, whatever package we pull comes from edge, which explains the inconsistency and why calibre can’t run.

We might need to wait for them to include it in an official Alpine release instead of just the edge branch.

What do you think?

0x-2FA avatar Aug 09 '25 17:08 0x-2FA

ahh i dont want to use calibre, we previously used it and it had so many issues I migrated away, and unsupported it

I recommend just building the EPUB out directly instead of 3rd party. Another great OSS PDF tool https://github.com/iib0011/omni-tools (Although their PDF editor is not OSS and 3rd party closed) is a good example of this for PDF to EPUB and should be possible to go backwards as well due to how the format works (or at least EPUB to HTML then re-use HTML to PDF

Frooodle avatar Aug 09 '25 18:08 Frooodle

Oh ok ok. Will look into it then, we can go no 3rd party at all or use something like this https://github.com/psiegman/epublib.

0x-2FA avatar Aug 09 '25 18:08 0x-2FA

Yeah that looks good too!

Frooodle avatar Aug 09 '25 19:08 Frooodle

Hi,

sorry butt in to the conversation but epublib is unmaintained and has severe limitations with more modern EPUB specs.

However, a slightly better maintained fork exist albeit not very popular: https://github.com/documentnode/epub4j

Can you @0x-2FA please use that instead for now?

balazs-szucs avatar Aug 11 '25 20:08 balazs-szucs

Honestly I doubt we need even that lib (But lets see)

Thanks for the call out!

Frooodle avatar Aug 11 '25 20:08 Frooodle

Will look into it this week. Since we already have the html to pdf I think we might not need the lib. But I think that it might help with Pdf to Epub conversion (like the creation of the epub after reading the pdf content).

0x-2FA avatar Aug 11 '25 21:08 0x-2FA

Update on the issue. I feel that epub4j is too good to skip for both Epub to Pdf and Pdf to Epub.

Some things I noticed while working with Epubs this weekend (many of which I did not know before):

  1. Every epub should contain a .opf file (Open Package Format). This is just an xml file that shows the order to display the files in.
  2. Some epubs have .html files and some others have .xhtml files.

The library is very helpful for reading the .epub file and its resources (HTML, CSS, images, fonts, and so on). It is especially useful for parsing the .opf file and constructing the "spine" of the Epub which tells us the correct order of the content files.

I think we have 2 options:

  1. We can do is a pass with HtmlToPdf for each .html file and then Merge them in the order that the .opf file provides.
  2. Create a big .html file (aka merge all the html files first) and then do HtmlToPdf.

I prefer the first option. It seems safer because we do not need to modify any of the original HTML content.

0x-2FA avatar Aug 18 '25 20:08 0x-2FA

I prefer the first option. It seems safer because we do not need to modify any of the original HTML content.

I may be stand corrected here but hard disagree.

As for trade off here:

The PDF tool doesn't see the whole book at once, so you miss out on big-picture stuff like:

  • Spot-on page numbers for a table of contents or index
  • Headers and footers that change based on the full layout
  • Links that jump between chapters (like from page 5 to page 200)
  • Fancy CSS tricks for pages, like forcing breaks, that need the entire flow to make sense
  • The tool gets the full picture, so things like page numbers, headers, footers, jump links, and overall styling rules come out good probably.
  • CSS relative links (like "grab the image from ../images/cover.jpg") would break when you combine everything.

WeasyPrint doesn't always handle relative CSS stuff perfectly anyways (it can be finicky and require extra setup like base_url flags), so that might get "ruined" either way. However, if I'm right (and I could be wrong on this POC would good here), then we could at least preserve the features I listed in section 1, in my opinion, are more valuable overall anyways. Ebook tend have not that many images (I think) but they always have table of contents for example.

I did the EML-to-PDF conversion, and I can tell you for 100% that WeasyPrint does not play well with images, no matter what magic you try. However, indexes, links, headers, footers, and chapter jumps should work well if you go with the second route. (but it might need some love on your JAVA code e.g., some fixing here and there)

Obviously I am 3rd party here, I am assuming you/Frooodle will have final say, but if it were up to me I would go 2. option 100%.

I think kind of quick/messy Proof-of-concept may not be that hard so that also there to settle the "argument".

(sorry to dump this work on you like this, it does feel bit dirty, since 2. probably harder but it would yield much better results, I think atleast.)

sorry about grammar should be fixed now

balazs-szucs avatar Aug 18 '25 21:08 balazs-szucs

Here is quick example;

  1. HTML
Image
  1. the PDF Weasyprint manages to create
Image

I can tell you, I tried everything to fix this, I don't think this is possible to make it work. I am assuming you'll get most likely similar result.

This is BTW did some adjustments to the CSS, if I wouldn't have touched it is even worse where half of the picture is out of the "frame"

But links btw work perfectly in the footer 😄 so atleast that works reliably.

balazs-szucs avatar Aug 18 '25 22:08 balazs-szucs

For html to PDF we really need a web based renderer solution really.. Wkhtmltopdf used to be good but it's unsupported and to many security issues

Chromium or similar could be interesting 🤔

Frooodle avatar Aug 18 '25 22:08 Frooodle

Hey, thanks for the feedback. That’s the reason I shared an update in the first place, and you’re more than welcome to add suggestions or ideas anytime. I will try both options and provide another update. The second option might also be more efficient, I was just thinking it might not be worth it because we would need to dig in and do some manual work to merge the various HTML files.

I’ll look into it and update you again. Since I haven’t yet tested the actual conversion with WeasyPrint, I will also check whether any issues with images or CSS arise. Tbh Im not really concerned about the CSS part since (in most cases) it is just a single file where they set the font family or text weight etc, nothing too advanced. The images part will be the most interesting, especially after reading your comment 😆

Also regarding Froodle’s comment, that’s the reason I initially went with Calibre. So we wouldn’t have to handle all of this ourselves. But, as you already know it had some issues with Alpine😅

Edit: Added info on the css/images part.

0x-2FA avatar Aug 18 '25 22:08 0x-2FA

Hi,

I was researching for other unrelated when stumbled across this: https://flothesof.github.io/pdf-conversion-kindle3.html

Ghostscript can output PDFs that are optimized for Kindle and other book readers. I'd love this as an optimization in the conversion if possible. I haven't done very that much research into this but I think this is very much worthwhile option for our book/comic format. I plan to adjust CBZ/CBR converter also to use this (in the future) because from what I can see this is very good

It does have a lot of parameters which makes it bit hard to use imho, so don't feel like you're forced to use it if you get initially "bad" results. (it can also be thread safe since it can process each "page" separately so people don't complain about performance 😆 )

We might be able to use this also as pipeline or something e.g., "Optimize PDF for ebook reading" or something (but this is very much up for discussion I don't want to force anything here). Since I do personally read lot of online stuff I'll be also testing privately, but I have high hopes. :)

balazs-szucs avatar Sep 24 '25 15:09 balazs-szucs

Anyways giving it a bit more thought this might better long-term enchantment but still keep in mind

balazs-szucs avatar Sep 24 '25 15:09 balazs-szucs

@0x-2FA any news?

balazs-szucs avatar Oct 06 '25 15:10 balazs-szucs

Hey, sorry I haven’t looked at this in a while but I'll review it again this week. I remember I was really close.

0x-2FA avatar Oct 06 '25 22:10 0x-2FA

Hi,

No worries, I don't this is urgent by any means. I just have library I want to migrate/archive 😄

balazs-szucs avatar Oct 06 '25 22:10 balazs-szucs