TextRecognitionDataGenerator
TextRecognitionDataGenerator copied to clipboard
Write the labels to separate files
I was trying to use the labels as ground-truth texts in tesseract. Currently, trdg writes the labels into a single lable.txt file. I understand that the part of the script that writes them to single file is the following.
if args.name_format == 2:
# Create file with filename-to-label connections
with open(
os.path.join(args.output_dir, "labels.txt"), "w", encoding="utf8"
) as f:
for i in range(string_count):
file_name = str(i) + "." + args.extension
label = strings[i]
if args.space_width == 0:
label = label.replace(" ", "")
f.write("{} {}\n".format(file_name, label))
Can someone with sufficient knowledge of python can help me to modify it so that the labels will be written as separate files?
What I want is:
0.gt.txt 1.gt.txt Each of those files would contain their respective labels inside.