tesstrain icon indicating copy to clipboard operation
tesstrain copied to clipboard

Incorrect/outdated documentation in README.md

Open pratheesh-prakash opened this issue 2 years ago • 4 comments

In general, the documentation provided in README.md is very vague, and doesn't explain the training parameters and their impact on the output model.

Apart from the above, the information provided in the README.md is incorrect and outdated. Here are some major issues I have noticed.

Line 126 of README.md says

FINETUNE_TYPE Finetune Training Type - Impact, Plus, Layer or blank. Default: ''

However, Makfile doesn't seem to have any method to make use of this parameter. The help documentation (available through make help) also misses out this line. Is it because this option is unavailable in the later versions, or is it because the Makefile is outdated? Additionally, there is no information whatsoever on how these arguments (i.e. Plus, layer or '') would influence the training.

For plotting CER, according to README.md, the user must run './plot/plot_cer.sh'. Unfortunately, there exists no such shell-script in `plot'. Additionally, the python scripts provided in 'plot' would work only if the log-file is parsed to produce a csv.

The documentation also misses on how to interpret the results, how to optimise the hyperparameters, and how to improve the training data (For eg: how can we prevent 'Compute CTC targets failed' errors.).

It would be great if README.md is updated with latest information, and a more clear and detailed explanation of various parameters are provided.

pratheesh-prakash avatar Oct 05 '22 05:10 pratheesh-prakash

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 13 '22 00:11 stale[bot]

@pratheesh-prakash, do you want to send a pull request which improves that documentation?

stweil avatar Nov 15 '22 09:11 stweil

@stweil: I really wish I could contribute to tesseract-ocr. But I do not have in-depth knowledge on the issues which I have raised. I have checked the documentation only to clarify those doubts, and found this information either missing or outdated in the documentation. I would suggest that the update be done by someone among the developers.

pratheesh-prakash avatar Nov 15 '22 10:11 pratheesh-prakash

Some details/explanation of whats happened is in #257.

zdenop avatar Feb 20 '23 17:02 zdenop