Balearica

Results 43 issues of Balearica

Creating this issue to gauge interest in a desktop application using Electron. Please "thumbs up" this issue if this is something you would use over the site. Additionally, if you...

The "line" element produced by mupdf sometimes includes multiple (actual) lines. For example, below is an example where 2 distinct lines are combined into a single `line` element. ``` ```

Tesseract does not perform well when input images have text at some angle != 0. However, Tesseract is also used to determine text angle at present. Therefore, we run Tesseract...

Terms such as `U.S.`, `e.g.` and `i.e.` are consistently misidentified by Tesseract Legacy, usually as `US.`, `eg.` and `ie.` (respectively). This appears to be because Tesseract's language model does not...

Punctuation patterns that are both grammatically correct and common in English are consistently misidentified by Tesseract Legacy. They appear to be missing from the list of acceptable punctuation patterns, which...

Tesseract uses Leptonica image thresholding functions to produce a binary image that the OCR engine is run on. At present this step uses all of the default options and is...

## Overview Tesseract.js currently accepts any valid image, and does not downsize large images. Additionally, while the memory allocated for the webassembly "heap" can increase if needed, it cannot decrease....

At present, we have 2 interfaces for running recognition: schedulers and workers. Both interfaces are explained in [this guide](https://github.com/naptha/tesseract.js/blob/master/docs/workers_vs_schedulers.md). I plan on combining them into a single interface, taking the...

Tesseract.js uses two sets of language data by default. When the `oem` is set to the default (LSTM only), integerized versions of `tessdata_best` (LSTM only data) are used. When `oem`...

The `blocks` output format includes various font attributes on the `word` level, including `is_italic` and `is_serif`. These do not appear to be functioning properly, and seem to always return `false`,...