parseq
parseq copied to clipboard
Training Suggestions for Cyrillic + English
Would it be possible to get training recommendations w.r.t data and parameters?
I'm trying to retrain parseq with a new character set consisting of both Latin (English alphabet) and Cyrillic (Russian alphabet) characters.
I have about 3500 custom image samples that I created by running detection and then cropping out the text. Example:
I have a few questions,
Is this a suitable training image? If I have ~3500 of images like these, both in English and Russian, how much synthetic data should I augment this with? Or do I need more real data too? My charset is ~160 what should I set the embedding dimension too? Is 384 large enough?
Thank you, and I appreciate any suggestions you can give!
Is this a suitable training image?
Yes, this works. Just be careful about data augmentation. You might want to reduce the magnitudes first.
If I have ~3500 of images like these, both in English and Russian, how much synthetic data should I augment this with? Or do I need more real data too?
Real data is much better. Or rather, the closer the training data distribution is to the test data distribution, the better. Try using the pretrained weights, at least for the encoder.
My charset is ~160 what should I set the embedding dimension too? Is 384 large enough?
The depth of the encoder has a much bigger effect on model performance compared to the embedding dimension. But if you can use a larger number, use it. Larger models are easier to work with in the experimentation phase.
Real data is much better. Or rather, the closer the training data distribution is to the test data distribution, the better. Try using the pretrained weights, at least for the encoder.
How do I use just the encoder pretrained weights?
I was thinking of using ~10M synthetic images generated with (https://github.com/clovaai/synthtiger) does that seem sufficient?
The depth of the encoder has a much bigger effect on model performance compared to the embedding dimension. But if you can use a larger number, use it. Larger models are easier to work with in the experimentation phase.
What do you suggest setting the depth and embedding dimension to?
How do I use just the encoder pretrained weights?
Take a look at the examples for finetuning with PyTorch. In a nutshell, you load the model and discard the layers you want to replace (in this case, the decoder).