gImageReader icon indicating copy to clipboard operation
gImageReader copied to clipboard

Add export to EPUB/FB2 formats

Open zamazan4ik opened this issue 7 years ago • 6 comments

FineReader has support for exporting into EPUB/FB2 formats. For these formats FineReader has these properties in settings: image

Document layout: plaint text or formatted text. Keep pictures: Best quality, Compact size or Custom.

Custom dialog looks like: image

Do you know any libraries for working with epub/fb2 formats?

zamazan4ik avatar Nov 13 '17 19:11 zamazan4ik

After searching in the Internet i didn't find any good SDK for EPUB and FB2 rendering. The best SDK is Readium SDK: https://github.com/readium/readium-sdk But quality of the library is terrible. For FB2 i didn't find anything useful.

Another option is trying to use Pandoc for converting some format to FB2 and Epub formats. I didn't find bindings to C++. Should be researched.

zamazan4ik avatar Nov 21 '17 09:11 zamazan4ik

There are some filters used by LibreOffice (epub 3 was recently added: https://vmiklos.hu/blog/basic-epub3-export.html). Maybe this is of use here?

narayaan avatar Nov 27 '17 12:11 narayaan

hmm, thank you :-) I'll check it.

zamazan4ik avatar Nov 27 '17 19:11 zamazan4ik

This recently got some updates, this was posted on the tdf website:

libe-book exports LibreOffice ODT files to EPUB3. At the moment it offers just basic features, but development is still undergoing and new features will be added before the next major release. The library can be downloaded from https://sourceforge.net/projects/libebook/. A description of the architecture and the features is available here: https://vmiklos.hu/blog/basic-epub3-export.html.

narayaan avatar Jan 22 '18 21:01 narayaan

I guess this is the more recent blog report:

https://vmiklos.hu/blog/epub3-improvements-2.html

narayaan avatar Jan 22 '18 21:01 narayaan

There is hOCR Tools. These are utilities, that each do a job on hOCR, those of interest for ePub creation may be:

  • hocr-check -- check the hOCR file for errors
  • hocr-combine -- combine pages in multiple hOCR files into a single document
  • hocr-cut -- cut a page (horizontally) into two pages in the middle
  • hocr-extract-images -- extract the images and texts within all the ocr_line elements
  • hocr-lines -- extract the text within all the ocr_line elements
  • hocr-merge-dc -- merge Dublin Core meta data into the hOCR HTML header
  • hocr-split -- split an hOCR file into individual pages

Basically you'd just need hocr-merge-dc and hocr-split. The results you could then feed to Pandoc, which would compose the ePub.

bmix avatar Mar 15 '19 20:03 bmix