tesstrain icon indicating copy to clipboard operation
tesstrain copied to clipboard

The box file is overwritten in training process

Open vishakraj25 opened this issue 2 years ago • 5 comments

Hello,

I followed the training procedure, there I generated the .gt.txt and .box files for the line images with help of tesseract

Then, I corrected/annotated the .gt.txt and .box files and added them in the data directory and started the training

Then, In the training process, all the .box files are overwritten. - Why It is happening

For example, lets take this image,

MT_Bank_1_22

and the corresponding box file which is overwritten in the training process is

Screenshot from 2023-02-24 11-57-51

Here, in the .box file, I did not annotated the 7th line, which contains \t. All the .box files contains the \t in the end.

If we assume, that the reason for \t at the end, because image has more spaces in the end of the line

Then, there are spaces in the start of the line too in image, but in .box file there is no spaces or tabs at the start

What is the concept flow for the .box annotation files?

Is it possible to stop the overwritten of the .box annotations in the training process?

Thanks

vishakraj25 avatar Feb 24 '23 06:02 vishakraj25

And, the coordinates are same for all characters, but the box file should have separate coordinates for each characters, isn't it

vishakraj25 avatar Feb 24 '23 09:02 vishakraj25

Please provide example case for replicating problem. Next: which training procedure you followed? Please provide link.

zdenop avatar Feb 24 '23 14:02 zdenop

Hi, I followed the training procedure mentioned in the readme file in this repo, with help of this tutorial - https://www.youtube.com/watch?v=KE4xEzFGSU8 - this is has good content, understanding the training steps easily

And, I tried in new system also, today, the same issue happening again

vishakraj25 avatar Feb 24 '23 17:02 vishakraj25

@vishakraj25 In my case, it turned out that I didn't even have to create any box files myself https://github.com/tesseract-ocr/tesstrain/issues/338#issuecomment-1487982907

khashashin avatar Mar 29 '23 05:03 khashashin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 22 '23 01:05 stale[bot]