tesstrain
tesstrain copied to clipboard
Feature Request: list.train and list.eval from different folders
Current implementation creates all-lstmf from the foo-ground-truth directory and splits it into two in the specified ratio by using the head and tail commands.
The disadvantage with this approach is that when there are a limited number of samples of some characters in the training data, there is no way to control that they are evenly divided in the training and eval group. So, it is quite possible that some characters may not be used for training at all.
I suggest letting the user specify two directories, one with training data and one with testing data.
Additionally, It would be great to split the testing data further into two groups for eval and validation. One of the changes in PR#207 does this split using the existing approach using head and tail. EDIT: see https://github.com/tesseract-ocr/tesstrain/pull/217
Current implementation creates
all-lstmffrom thefoo-ground-truthdirectory and splits it into two in the specified ratio by using theheadandtailcommands.The disadvantage with this approach is that when there are a limited number of samples of some characters in the training data, there is no way to control that they are evenly divided in the training and eval group. So, it is quite possible that some characters may not be used for training at all.
I suggest letting the user specify two directories, one with training data and one with testing data.
Additionally, It would be great to split the testing data further into two groups for
evalandvalidation. One of the changes in PR#207 does this split using the existing approach usingheadandtail. EDIT: see #217
Hi, could you specify which command does this? : "Current implementation creates all-lstmf from the foo-ground-truth directory ", Thanks!
could you specify which command does this?
make lists --trace should show you all the commands executed for making the lists.
could you specify which command does this?
make lists --traceshould show you all the commands executed for making the lists.
Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
https://groups.google.com/g/tesseract-ocr/c/HFpYH5i7VRw/m/72tnGgCmDAAJ
Question regarding use of custom list.train and list.eval
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
It is always possible to create custom list.train and list.eval and use those instead of the ones created by the Makefile.
It is always possible to create custom
list.trainandlist.evaland use those instead of the ones created by the Makefile.
It could be documented, though.
However, there's a big catch: the timestamp is important; if your manual list.train and list.eval are older than any of the *.gt.txt (or derived *.lstmf), then they will be overwritten by the next make. So perhaps we should offer some explicit manual override?