gImageReader
gImageReader copied to clipboard
Add export to EPUB/FB2 formats
FineReader has support for exporting into EPUB/FB2 formats. For these formats FineReader has these properties in settings:
Document layout: plaint text or formatted text. Keep pictures: Best quality, Compact size or Custom.
Custom dialog looks like:
Do you know any libraries for working with epub/fb2 formats?
After searching in the Internet i didn't find any good SDK for EPUB and FB2 rendering. The best SDK is Readium SDK: https://github.com/readium/readium-sdk But quality of the library is terrible. For FB2 i didn't find anything useful.
Another option is trying to use Pandoc for converting some format to FB2 and Epub formats. I didn't find bindings to C++. Should be researched.
There are some filters used by LibreOffice (epub 3 was recently added: https://vmiklos.hu/blog/basic-epub3-export.html). Maybe this is of use here?
hmm, thank you :-) I'll check it.
This recently got some updates, this was posted on the tdf website:
libe-book exports LibreOffice ODT files to EPUB3. At the moment it offers just basic features, but development is still undergoing and new features will be added before the next major release. The library can be downloaded from https://sourceforge.net/projects/libebook/. A description of the architecture and the features is available here: https://vmiklos.hu/blog/basic-epub3-export.html.
I guess this is the more recent blog report:
https://vmiklos.hu/blog/epub3-improvements-2.html
There is hOCR Tools. These are utilities, that each do a job on hOCR, those of interest for ePub creation may be:
- hocr-check -- check the hOCR file for errors
- hocr-combine -- combine pages in multiple hOCR files into a single document
- hocr-cut -- cut a page (horizontally) into two pages in the middle
- hocr-extract-images -- extract the images and texts within all the ocr_line elements
- hocr-lines -- extract the text within all the ocr_line elements
- hocr-merge-dc -- merge Dublin Core meta data into the hOCR HTML header
- hocr-split -- split an hOCR file into individual pages
Basically you'd just need hocr-merge-dc and hocr-split. The results you could then feed to Pandoc, which would compose the ePub.