rmast
rmast
This mod seems to do a slightly better job, still not flawless... 
Concerning Tesseract builds and double administrations, I guess you were also involved in the version 5 upgrade of the naphta-repo? Are you aware of an instruction for porting new versions...
Anyway, I was able to rebuild the Vanilla version of Tesseract, by rolling back the non-vanilla changes in glue.js and glue.cpp, and altering the build-scripts/var.sh to do te vanilla-build. I...
For some reason the change you did on the master branch almost two months ago doesn't show on scribeocr.com. I'll try to build it myself.
I can't get the blocked é to work with master. I'll apply my patch to make sure I'm mastering the whole pipeline, like clearing node_modules and browsercache.
"één" still not coming through from commit commit d015ef9afff63900d59b08291891fcbdc10f8c91, but my hold of the build-and-show-process with copying stuff from other repo's is clearly enough as I have the "print" on...
I will just bluntly copy the vanilla versions over the scribeocr version to see whether it really is within Tesseract instead of in it's config. Line 75 of the LSTM...
And indeed, the scribeocr-version of Tesseract also contains a bias towards één. I'll try to build and debug that standalone.
git bisect has revealed the first commit biased towards "één" at the start of a sentence: https://github.com/Balearica/tesseract/commit/c646b3643719391aae924a53e7325c20268e4b9c In that commit the cause is finder = new ColumnFinder(static_cast(to_block->line_size **/ 2**), blkbox.botleft(),...
I struggle to build the Vanilla build. scribeocr/tesseract.js-core only contains a master branch with a vars.sh that mentions a vanilla branch. The master branch contains a vars.h which overrides any...