TextRecognitionDataGenerator icon indicating copy to clipboard operation
TextRecognitionDataGenerator copied to clipboard

Only 1000 unique strings are being generated

Open shankar-surepass opened this issue 5 years ago • 2 comments

While using GeneratorFromRandom only 1000 random data are generated. While I cheked the source code I could not found any issue with the code. However I had changed the pool of symbols and letter and digit pool are same. Is there is some issue with the python random?

shankar-surepass avatar Feb 01 '20 06:02 shankar-surepass

Just to be clear, it generates only 1000 samples and then stops, or generates only 1000 unique samples and then starts over with the same images?

Could you provide a reproducible test case?

Belval avatar Feb 01 '20 22:02 Belval

I got the same issue and I just noticed that in GeneratorFromRandom class (from_random.py file) the __iter__ method is returning self.generator so the GeneratorFromRandom.__next__ will never be called (instead the GeneratorFromStrings.__next__ method will be).

A quick fix will be: 1st: returning self instead of self.generator in __iter__ method 2nd: add a generated_count variable and increment it in each __next__ call and throw a StopIteration when it reaches self.count 3rd: in GeneratorFromRandom.next method, when generated_count >= 999, create a new generator in self.generator.

You might find a more elegant way to solve this, but at least its quick and it does work.

Ps: there is the same problem in the Generator from Wikipedia.

AghilesAzzoug avatar Jan 20 '21 16:01 AghilesAzzoug