tesstrain
tesstrain copied to clipboard
The box file is overwritten in training process
Hello,
I followed the training procedure, there I generated the .gt.txt
and .box
files for the line images with help of tesseract
Then, I corrected/annotated the .gt.txt
and .box
files and added them in the data directory and started the training
Then, In the training process, all the .box
files are overwritten. - Why It is happening
For example, lets take this image,
and the corresponding box file which is overwritten in the training process is
Here, in the .box
file, I did not annotated the 7th
line, which contains \t
. All the .box
files contains the \t
in the end.
If we assume, that the reason for \t
at the end, because image has more spaces in the end of the line
Then, there are spaces in the start of the line too in image, but in .box
file there is no spaces or tabs at the start
What is the concept flow for the .box
annotation files?
Is it possible to stop the overwritten of the .box
annotations in the training process?
Thanks
And, the coordinates are same for all characters, but the box file should have separate coordinates for each characters, isn't it
Please provide example case for replicating problem. Next: which training procedure you followed? Please provide link.
Hi, I followed the training procedure mentioned in the readme file in this repo, with help of this tutorial - https://www.youtube.com/watch?v=KE4xEzFGSU8 - this is has good content, understanding the training steps easily
And, I tried in new system also, today, the same issue happening again
@vishakraj25 In my case, it turned out that I didn't even have to create any box files myself https://github.com/tesseract-ocr/tesstrain/issues/338#issuecomment-1487982907
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.