tesseract
tesseract copied to clipboard
Support image width and height larger than 32767
Signed-off-by: Stefan Weil [email protected]
Please don't merge until these changes were better tested.
The modifications here allow processing of large images which fixes issue #3184. I only tested the example in that issue and did not run performance tests or measure the increased memory usage.
Known issues: code setting x / y / width / height to INT16_MAX
or -INT16_MAX
needs modifications, too.
Please don't merge until these changes were better tested.
This pull request is still a work in progress Draft pull requests cannot be merged.
I wonder about the impact on regular size images.
How much more memory (in percents) will be consumed with this patch?
How much more memory (in percents) will be consumed with this patch?
I don't have numbers up to now.
This pull request introduces 3 alerts when merging eb8f13bea8adba3319413e78a6b67a8afb90b48b into 93348a83a324a479978d9dd399b34d15ec6c5d83 - view on LGTM.com
new alerts:
- 3 for Comparison of narrow type with wide type in loop condition
You changed int16->int32 a lot.
That must be a typedef (using
) for possible future changes?
That must be a typedef (
using
) for possible future changes?
Yes, the current Tesseract code uses int16_t
which I replaced by int32_t
for image dimensions.
I considered using a typedef, but don't think that there will be a future change for which a typedef would help. What kind of possible change do you think of?
What kind of possible change do you think of?
I agree, this may be subtle.
But from programmer point of view seeing coord_t
(or something like that) instead of just int32_t
annotate code much better.
You may note that in tess we have a lot of int vars which are hard to understand when navigating code here and there.
But from programmer point of view seeing
coord_t
(or something like that) instead of justint32_t
annotate code much better.
coord_t
would violate the standards which reserve types ending with "_t".
I now use TDimension
.
violate the standards
What standards?
What standards?
ISO C, see https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html.
Should we make support for large images an optional feature in release 5.0.0 by adding a configure option, for example --enable-large-images
?
Or just ignore them. If something big does not fit into 5.0.0 soon enough, just postpone.
If you choose to do it for 5.0.0, I suggest to add to the help message that explains this option:
Experimental feature (use it only for testing purpose)
and in a comment in the code itself:
For package builders: We recommend not to enable this option because the feature is unstable/untested.
Most parts of the initial pull requests are now in main
, so support for larger images is prepared.
@stweil Hi, Is this feat merged to main branched? Why i just see this not changed at the tag 5.3.0 source code :
No, it isn't merged. This pull request is still a draft.
it seems some years already, what‘s the reason?