doctr icon indicating copy to clipboard operation
doctr copied to clipboard

"Universal" European text recognition model.

Open Xargonus opened this issue 2 years ago • 5 comments

🚀 The feature

Train a model on a vocabulary that subsumes all of the similar European languages (German, French, Portuguese, Spanish, Czech, Polish, etc.) These languages only differ in a few letters.

Motivation, pitch

In all kinds of real-world texts, names from different languages appear, for example, a German text could contain French names.

Alternatives

No response

Additional context

No response

Xargonus avatar Apr 05 '22 19:04 Xargonus

Hi @Xargonus ,

sounds really interesting 👍 The main question would be where to get data /lots of data (in best case real and balanced 😅 ). Feel free to share your ideas 😄 For your interest there is currently a wordgenerator implemented but it needs some improvements 👍 otherwise you can test trdg lib to generate some synthetic data

felixdittrich92 avatar Apr 06 '22 20:04 felixdittrich92

@Xargonus any update ? :)

felixdittrich92 avatar Apr 28 '22 21:04 felixdittrich92

Hello @Xargonus :wave:

As suggested by Felix, the word generator of docTR can take in any character set. So if your interest is in visual recognition and not semantic understanding, I guess this is doable using these tools!

What do you think? If I misunderstood, feel free to elaborate your request :)

frgfm avatar Jun 28 '22 15:06 frgfm

Hi @Xargonus 👋 any updates ?

felixdittrich92 avatar Sep 04 '22 17:09 felixdittrich92

Hi @Xargonus :wave:, would ask again if there are any updates from your side ? :)

felixdittrich92 avatar Sep 27 '22 13:09 felixdittrich92

Closing because no response

felixdittrich92 avatar Jun 25 '23 21:06 felixdittrich92