Robert Sachunsky comments

Results 735 comments of


                                            Robert Sachunsky

trafficstars

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

I have no idea how to generate these files (except extracting from their respective script models). @stweil, your published data directories do contain such files – did you put them...

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

Perhaps we are missing the original `set_unicharset_properties` rule, which enriches the generated `unicharset` for the model?

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

> I copied them from https://github.com/tesseract-ocr/langdata_lstm (or used local symbolic links to a local copy of that repository). That fixes most warnings (all but `Inherited.unicharset`). Oh, I see! But how...

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

> langdata_lstm is not a small repository, so I don't like the idea of having it as a subrepository. > > Documenting the requirement could be a first step. Parsing...

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

Done. Please re-review!

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

> Done. Please re-review! Or should we place all `*.unicharset` and `radical-stroke.txt` into a subdirectory `langdata` to keep `DATA_DIR` tidy? (Would only need to change the `script_dir` argument ...)

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

> Or should we place all `*.unicharset` and `radical-stroke.txt` into a subdirectory `langdata` to keep `DATA_DIR` tidy? (Would only need to change the `script_dir` argument ...) Let's do this! That...

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

Done. I have also updated from master to manually resolve the conflict, and added two minor improvements to the rules for all-gt / all-lstmf.

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

There was some additional fallout to the `all-lstmf` / `all-gt` speedups (by not repeating `find`): with large directories, the `paste` recipe would quickly run into `E2BIG` (because not all command-line...

explicate .lstm-unicharset and my.unicharset prereqs for finetuning

> it would help me a lot if you could make separate pull requests for your commits instead of adding more and more commits to this one. That also increases...