tesseract
tesseract copied to clipboard
Text2Image isn't working properly
I'm trying to retrain this Tesseract Engine (https://gitlab.com/pninim.org/tessdata_heb_rashi/-/blob/main/tesseract_4.1.1/TRAINING.md) for a specific obscure Hebrew Script for Tesseract 5. I'm trying to, using the command listed there, get a list of available fonts using text2image --list_available_fonts --fonts_dir FontsRashi/Working which initially worked but has ceased to do so.
Environment
- Tesseract Version: 5.0.0
- Commit Number:
- Platform: 64 Bit Fedora 35
Current Behavior: Displays (process:98484): Pango-CRITICAL **: 23:45:52.231: pango_font_description_set_size: assertion 'size >= 0' failed followed by what seems like a list of fonts installed on the system.
Expected Behavior: List the Fonts available in a directory
Suggested Fix: No idea. I need help troubleshooting this issue. Expected behavior was demonstrated until very recently despite the fact that I seem to be using the same install since I built from source (I don't remember the commit used)
Below are some photos relevant to the error.
I am afraid we can do nothing here: text2image uses pango for working with fonts, so if there (in pango) is problem/error it must be solved there. Anyway simple test case (font + short text for generating image) + info about libraries version could help with debuging...
Maybe related issue: https://github.com/amitdo/text2tif-2016/issues/5.
text2image --list_available_fonts --fonts_dir FontsRashi/Working
This is wrong usage of the command.
It should be:
text2image --list_available_fonts --fonts_dir=/fullpath/to/FontsRashi/Working
Maybe related issue: amitdo/text2tif-2016#5.
:-)
It seems that the list we produce may contain fonts that Pango can't render (not otf/ttf fonts).
Maybe there is a way to filter these fonts.
I think I have ran into the same issue in the sense that the --fonts_dir
is nohow used! I have tried with freshly built tess:
yoh@typhon:~/proj/repronim/tesseract-train$ text2image --version
Using CAIRO_FONT_TYPE_FT.
Pango version: 1.50.12
5.3.1-22-g24da4
to list fonts I have in a folder:
yoh@typhon:~/proj/repronim/tesseract-train$ strace -f -o /tmp/123 text2image --list_available_fonts --fonts_dir ../siemens-fonts/selected/ >/dev/null
(process:3654728): Pango-CRITICAL **: 11:04:36.387: pango_font_description_set_size: assertion 'size >= 0' failed
and it doesn't even look into that folder:
yoh@typhon:~/proj/repronim/tesseract-train$ grep siemens /tmp/123
3654728 execve("/home/yoh/proj/misc/tesseract/build/bin/text2image", ["text2image", "--list_available_fonts", "--fonts_dir", "../siemens-fonts/selected/"], 0x7ffc067c20d0 /* 27 vars */) = 0
Try to use the full path to the font dir instead of a relative path.
yoh@typhon:~/proj/repronim/tesseract-train/tesseract_tutorial$ rm /tmp/123-2; strace -s 1024 -f -o /tmp/123-2 text2image --list_available_fonts --fonts_dir $PWD/../../siemens-fonts/fonts/ | nl | tail
(process:4152228): Pango-CRITICAL **: 14:53:09.019: pango_font_description_set_size: assertion 'size >= 0' failed
124 123: Symbola Semi-Condensed
125 124: URW Bookman Light
126 125: URW Bookman Light Italic
127 126: URW Bookman Semi-Bold
128 127: URW Bookman Semi-Bold Italic
129 128: URW Gothic
130 129: URW Gothic Oblique
131 130: URW Gothic Semi-Bold
132 131: URW Gothic Semi-Bold Oblique
133 132: Z003 Medium Italic
yoh@typhon:~/proj/repronim/tesseract-train/tesseract_tutorial$ grep sieme /tmp/123-2
4152228 execve("/home/yoh/proj/misc/tesseract/build/bin/text2image", ["text2image", "--list_available_fonts", "--fonts_dir", "/home/yoh/proj/repronim/tesseract-train/tesseract_tutorial/../../siemens-fonts/fonts/"], 0x7fff0768a7c0 /* 27 vars */) = 0
4152228 write(3, "<?xml version=\"1.0\"?>\n<!DOCTYPE fontconfig SYSTEM \"fonts.dtd\">\n<fontconfig>\n<dir>/home/yoh/proj/repronim/tesseract-train/tesseract_tutorial/../../siemens-fonts/fonts/</dir>\n<cachedir></cachedir>\n<config></config>\n</fontconfig>\n", 227) = 227
yoh@typhon:~/proj/repronim/tesseract-train/tesseract_tutorial$ grep 'O_WRONLY.*= 3$' /tmp/123-2
4152228 openat(AT_FDCWD, "fonts.conf", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
so only shows that it does create that fonts.conf
locally with the path but otherwise that folder is not accessed.