gImageReader Add export to EPUB/FB2 formats

FineReader has support for exporting into EPUB/FB2 formats. For these formats FineReader has these properties in settings:

Document layout: plaint text or formatted text. Keep pictures: Best quality, Compact size or Custom.

Custom dialog looks like:

Do you know any libraries for working with epub/fb2 formats?

Nov 13 '17 19:11 zamazan4ik

After searching in the Internet i didn't find any good SDK for EPUB and FB2 rendering. The best SDK is Readium SDK: https://github.com/readium/readium-sdk But quality of the library is terrible. For FB2 i didn't find anything useful.

Another option is trying to use Pandoc for converting some format to FB2 and Epub formats. I didn't find bindings to C++. Should be researched.

Nov 21 '17 09:11 zamazan4ik

There are some filters used by LibreOffice (epub 3 was recently added: https://vmiklos.hu/blog/basic-epub3-export.html). Maybe this is of use here?

Nov 27 '17 12:11 narayaan

hmm, thank you :-) I'll check it.

Nov 27 '17 19:11 zamazan4ik

This recently got some updates, this was posted on the tdf website:

libe-book exports LibreOffice ODT files to EPUB3. At the moment it offers just basic features, but development is still undergoing and new features will be added before the next major release. The library can be downloaded from https://sourceforge.net/projects/libebook/. A description of the architecture and the features is available here: https://vmiklos.hu/blog/basic-epub3-export.html.

Jan 22 '18 21:01 narayaan

I guess this is the more recent blog report:

https://vmiklos.hu/blog/epub3-improvements-2.html

Jan 22 '18 21:01 narayaan

There is hOCR Tools. These are utilities, that each do a job on hOCR, those of interest for ePub creation may be:

hocr-check -- check the hOCR file for errors
hocr-combine -- combine pages in multiple hOCR files into a single document
hocr-cut -- cut a page (horizontally) into two pages in the middle
hocr-extract-images -- extract the images and texts within all the ocr_line elements
hocr-lines -- extract the text within all the ocr_line elements
hocr-merge-dc -- merge Dublin Core meta data into the hOCR HTML header
hocr-split -- split an hOCR file into individual pages

Basically you'd just need hocr-merge-dc and hocr-split. The results you could then feed to Pandoc, which would compose the ePub.

Mar 15 '19 20:03 bmix

gImageReader gImageReader copied to clipboard

Add export to EPUB/FB2 formats

gImageReader
gImageReader copied to clipboard