doctr
doctr copied to clipboard
"Universal" European text recognition model.
🚀 The feature
Train a model on a vocabulary that subsumes all of the similar European languages (German, French, Portuguese, Spanish, Czech, Polish, etc.) These languages only differ in a few letters.
Motivation, pitch
In all kinds of real-world texts, names from different languages appear, for example, a German text could contain French names.
Alternatives
No response
Additional context
No response
Hi @Xargonus ,
sounds really interesting 👍 The main question would be where to get data /lots of data (in best case real and balanced 😅 ). Feel free to share your ideas 😄 For your interest there is currently a wordgenerator implemented but it needs some improvements 👍 otherwise you can test trdg lib to generate some synthetic data
@Xargonus any update ? :)
Hello @Xargonus :wave:
As suggested by Felix, the word generator of docTR can take in any character set. So if your interest is in visual recognition and not semantic understanding, I guess this is doable using these tools!
What do you think? If I misunderstood, feel free to elaborate your request :)
Hi @Xargonus 👋 any updates ?
Hi @Xargonus :wave:, would ask again if there are any updates from your side ? :)
Closing because no response