Robert Sachunsky
Robert Sachunsky
I have no idea how to generate these files (except extracting from their respective script models). @stweil, your published data directories do contain such files – did you put them...
Perhaps we are missing the original `set_unicharset_properties` rule, which enriches the generated `unicharset` for the model?
> I copied them from https://github.com/tesseract-ocr/langdata_lstm (or used local symbolic links to a local copy of that repository). That fixes most warnings (all but `Inherited.unicharset`). Oh, I see! But how...
> langdata_lstm is not a small repository, so I don't like the idea of having it as a subrepository. > > Documenting the requirement could be a first step. Parsing...
Done. Please re-review!
> Done. Please re-review! Or should we place all `*.unicharset` and `radical-stroke.txt` into a subdirectory `langdata` to keep `DATA_DIR` tidy? (Would only need to change the `script_dir` argument ...)
> Or should we place all `*.unicharset` and `radical-stroke.txt` into a subdirectory `langdata` to keep `DATA_DIR` tidy? (Would only need to change the `script_dir` argument ...) Let's do this! That...
Done. I have also updated from master to manually resolve the conflict, and added two minor improvements to the rules for all-gt / all-lstmf.
There was some additional fallout to the `all-lstmf` / `all-gt` speedups (by not repeating `find`): with large directories, the `paste` recipe would quickly run into `E2BIG` (because not all command-line...
> it would help me a lot if you could make separate pull requests for your commits instead of adding more and more commits to this one. That also increases...