tesstrain
tesstrain copied to clipboard
Add --vertical_fontlist option to tesstrain.py
Porting from https://github.com/tesseract-ocr/tesseract/pull/3434 (not merged) .
This Pull Request adds --vertical_fontlist option to tesstrain.sh to specify a list of fontnames to render vertical text. The format for specifying a list of fontnames is the same as for --font_list option. If --vertical_fontlist <FONTS> option is specified, it will override the VERTICLA_FONTS variable(defined in language-specific.sh) with the specified list of fontnames.
In the current version, the VERTICAL_FONTS variable is hardcoded in language-specific.sh. So, when creating training data for vertical text such as Japanese, users need to edit the source code even if they specify a list of fontnames with --fontlist and --font_dir.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Looks like this was closed accidentally
For vertical text, it is hard to use without this option. However, I have developed a more powerful script to replace text2image/tesstrain.py, so I don't use tesstrain.py anymore.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If I understand "Makefile training" correctly, this "src/tesstrain" is not used for training at the moment. AFAIK these python script are based on old shell training scripts. I suggest to keep them and maybe we can find the way how to integrate it to current training process...
@nagadomi : can you update this PR to recent git code?
According to my request in #307, there basically are two ways to train Tesseract inside this repository:
- The Makefile-based approach to train on real data.
- The Python-based approach to train on artificial data, corresponding to the old Bash-based approach.
My suggestion had been to actually document this somewhere as this is not always clear, but due to the harsh stale automation and (at least in the past) rather restricted responses, this has been buried into closed issues.
As I mentioned in other PR - I am interested in python based training as "make training" is difficult run on windows and it requires tool that could be easily replaced by python (unzip, wget, bc ...). I would suggest to merge/rework current open PR and then move forward (review issues/PR marked as "stale"...
committed in https://github.com/tesseract-ocr/tesstrain/commit/2c7c6e8feaf8aa1f2d1750b689fa46473453885e