tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

Add a way to set "Init Only" parameters (user_word_suffix, etc.)

Open clnoel opened this issue 3 years ago • 1 comments

I would like to use the command-line parameters "user_word_suffix", "load_freq_dawg", and "load_system_dawg". After sorting through a lot of documentation, and looking through a lot of code, I realized that these are "init only" parameters. In the TessBaseAPI code, they need to be passed to Init(), either as a set of keys/values or in a config file. Setting the parameters after initialization doesn't work because the traineddata files have already been read and the dictionaries formed.

Suggested fix: Add a config filename optional parameter (string) to worker.Initialize(...) that gets passed to api.Init(...).

Other fixes: Add a worker.SetInitParameters() function, just like worker.SetParameters(), that must be called before worker.Initialize, and pass those keys/values to api.Init(). Add a "initParams" optional parameter to worker.Initialize, which contains key/value pairs that get passed to api.Init

I'm suggesting the config file option because it feels like the least work to get the desired result.

clnoel avatar May 16 '22 15:05 clnoel

I looked it up, and it seems like the init-only parameters are few in number and relatively fringe (notably disabling various dictionaries). However, I agree that it would be nice to have some way for advanced users to specify a config file (like is possible on desktop). I will look into whether this can be easily added in a future release.

Balearica avatar Sep 22 '22 01:09 Balearica

This feature has been added to the dev/v4 branch, and will be released with version 4. If you would like to test before then, instructions are in #662.

To easily verify that these options are indeed being set, I am attaching a test image with significantly different results for the legacy model (oem: "0") depending on whether load_number_dawg is enabled.

number_test2

Results with load_number_dawg: "1"

1823747 72460000
271.83 1223.00
3164675.10 1512895284

Results with load_number_dawg: "0"

18237.47 724600.00
271.83 1223.00
3164675.10 15128952.84

Balearica avatar Sep 24 '22 01:09 Balearica

Closing as this was added in Version 4.

Balearica avatar Nov 25 '22 20:11 Balearica