tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Leptonica pixaReadMemMultipageTiff method

Open Hiale opened this issue 4 years ago • 5 comments

Mentioned in this comment, I implemented PixArray.LoadMultiPageTiffFromMemory(byte[])

Hiale avatar Jul 12 '21 22:07 Hiale

Hi @charlesw , we need the function load LoadMultiPageTiffFromMemory. Is this pull request ready to be merged? https://github.com/charlesw/tesseract/pull/562/commits

seabird86 avatar Nov 07 '22 01:11 seabird86

The pull request looks fine. Unfortunately ran out of time tonight (just trying to update dev branch to 5.2 then I'll merge this change in).

On Mon, 7 Nov 2022, 12:48 Nguyen Tuan Anh, @.***> wrote:

Hi @charlesw https://github.com/charlesw , we need the function load LoadMultiPageTiffFromMemory. Is this pull request ready to be merged? https://github.com/charlesw/tesseract/pull/562/commits

— Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/pull/562#issuecomment-1304979446, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7HSDW3HXJTXSOE7VMAY3WHBNVPANCNFSM5AHXENCA . You are receiving this because you were mentioned.Message ID: @.***>

charlesw avatar Nov 07 '22 11:11 charlesw

Are there any advantages to throwing in a multi-page image? Following the responsibilities, I would have expected an OCR module to focus on OCR. Consequently, there would only be flat interfaces for simple sources, like 1-page objects. In my opinion, the preparation of a multi-page image (splitting?) does not belong there.

Micke3rd avatar Nov 09 '22 08:11 Micke3rd

Using leptonica to read images directly can have significant performance advantages.

On Nov 9, 2022 2:58 AM, Micke @.***> wrote: [External email: Use caution! Do not open attachments or click on links from unknown senders or unexpected emails.]

Are there any advantages to throwing in a multi-page image? Following the responsibilities, I would have expected an OCR module to focus on OCR. Consequently, there would only be flat interfaces for simple sources (one-sided objects). In my opinion, the preparation of a multi-page image (splitting) does not belong there.

— Reply to this email directly, view it on GitHubhttps://protect-us.mimecast.com/s/m1igCVOlVrt28mDBTGtMto?domain=github.com, or unsubscribehttps://protect-us.mimecast.com/s/tmP7CW6m9vs68ZA3Tx_wX2?domain=github.com. You are receiving this because you are subscribed to this thread.Message ID: @.***>

tdhintz avatar Nov 09 '22 11:11 tdhintz

can have

is not enough. Basically a multi-page processed as a block takes longer than when its individual pages are processed in parallel. This means that it would have to be investigated how leptonica processes. And all just to save yourself a split ?

Micke3rd avatar Nov 23 '22 11:11 Micke3rd