TextRecognitionDataGenerator icon indicating copy to clipboard operation
TextRecognitionDataGenerator copied to clipboard

Write the labels to separate files

Open DesBw opened this issue 8 months ago • 0 comments

I was trying to use the labels as ground-truth texts in tesseract. Currently, trdg writes the labels into a single lable.txt file. I understand that the part of the script that writes them to single file is the following.

if args.name_format == 2:
      # Create file with filename-to-label connections
      with open(
          os.path.join(args.output_dir, "labels.txt"), "w", encoding="utf8"
      ) as f:
          for i in range(string_count):
              file_name = str(i) + "." + args.extension
              label = strings[i]
              if args.space_width == 0:
                  label = label.replace(" ", "")
              f.write("{} {}\n".format(file_name, label))

Can someone with sufficient knowledge of python can help me to modify it so that the labels will be written as separate files?

What I want is:

0.gt.txt 1.gt.txt Each of those files would contain their respective labels inside.

DesBw avatar Nov 08 '23 11:11 DesBw