TextRecognitionDataGenerator
TextRecognitionDataGenerator copied to clipboard
Generate rendered text in PyTorch dataset
Hi there,
I would like to use your Python API directly in my PyTorch dataset to render text from a dataset of strings. Unfortunately, I cannot keep in memory my dataset so I am not able to use the GeneratorFromStrings
class. An easy solution would be to instantiate the GeneratorFromStrings
all the times I want to generate a new image inside the __getitem__
. However, I feel this will be very costly because the code has to load a new font all the times. Any suggestions?
Your PyTorch dataset could hold the GeneratorFromStrings object and you can implement a lazy loading for the strings in your dataset.
Since you dataset is only strings you could most likely split it with split -l 20000 filename
and only load the right sample file.
Just an idea, I don't think I have enough visibility to really help you, but you are most likely right about instantiating GenerateFromStrings that would be costly.