chafa [Proposal] Optimal custom fonts using machine learning

The fonts used by us day to day are not specifically designed for printing images. In order to improve the result of character art, custom font is a doable solution, as proved by BE256 and BE512 fonts.

The highest resolution that chafa supports is 8x8 according to the bitmap. There are 2^64 possible combinations, but the whole Unicode table is not enough for that. However, my bold assumption is that most solutions in the space formed of 2^64 combinations are useless.

To find out the most useful N patterns in this space, we can take advantage from machine learning.

The proposed procedure for creating such custom font looks like this:

sample M random crops (in ratio w=1:h=2) from an image dataset
turn the M crops into binarized bitmaps using histogram and downsample to 8x8
find N cluster centers in the space by leveraging Kmeans algorithm, with the M (64x1) binaryzed vectors as the dataset.
convert the N vectors into C bitmap header and SVG plots.

Highlight:

Best resolution.
Easy to code. Automatic font and C code generation.
I don't know what glyph is good for printing character art, but the algorithm can figure it out.

Assignee: myself

Oct 02 '18 08:10 cdluminate

Sounds like an interesting project. We could definitely ship such a font with Chafa and support it with --symbols.

Oct 03 '18 16:10 hpjansson

Now the goal of this issue turned into

find the optimal set of glyphs
documentation
autotools: add option to enable this feature

Jan 10 '19 15:01 cdluminate

Let's move forward a bit: update the font generator and really ship a (basically working) font file?

Aug 07 '19 03:08 cdluminate

Let's move forward a bit: update the font generator and really ship a (basically working) font file?

Sounds good to me.

I don't like to keep generated blobs in git, but it's important to keep the build and installation process simple, so here's what I think we should do:

Keep the JSON from the last good training run in git, and also the TTF. Both gzipped.
Have Makefile rule for generating $(JSON).gz -> $(TTF).gz, disabled by default. Configure switch kind of like --enable-gtk-doc. This depends on Python + modules (e.g. fontforge).
Install the TTF file and chafa8x8.py always (maybe call chafa8x8.py something else to avoid confusion).

We may also have to keep the generated C source (also gzipped), but I'd prefer to load it dynamically, either from the JSON or the TTF. I'll have to see what's the best solution here, maybe we should be using a more compact format instead of JSON.

How does that sound?

Aug 07 '19 18:08 hpjansson

Sounds good to me.

maybe we should be using a more compact format instead of JSON.

Any suggestion? I think JSON is just the format of best compatibility, even if not the most compact one.

Besides, we can add a configure flag called, e.g. --enable-kmeans-font which triggers the TTF file generation, and defines a C macro like CHAFA_HAS_KMEANS_FONT:

#ifdef CHAFA_HAS_KMEANS_FONT
#include "auto-generated.c"
#endif

Aug 08 '19 02:08 cdluminate

Yeah, I was just thinking about the overhead of parsing about a megabyte of JSON on startup. However, I think we could go for a better solution where Chafa just loads the glyphs right out of the font file. That would be optimal, since we could adapt to any font and get better results in e.g. ascii mode too. It should be enough to link with FreeType or Harfbuzz, which are very common and low in the stack.

Aug 08 '19 21:08 hpjansson

Chafa would then have a switch, e.g. --font-glyphs fontfile.ttf which would import glyphs from a font and map them to their respective Unicode code points. You could even specify it multiple times. Then there would be --symbols range where you could specify that you want to allow custom code point ranges, e.g. the one used in the k-means font. Using those two switches together you would get the desired result.

Aug 08 '19 22:08 hpjansson

Cool! I like this idea.

Aug 08 '19 23:08 cdluminate

It's in master now. Here's how to use it:

chafa --glyph-file chafa8x8.ttf --symbols 0x100000..0x101000

The font loading is a little bit slow, and I need to fine tune the bitmap generator, but I'm already getting improved output with e.g. chafa --glyph-file ter-x12n.pcf --symbols all where the font file corresponds to the Terminus font I'm using in the terminal.

Aug 29 '19 00:08 hpjansson

Nice. Now I think the C code generation part can be safely removed from fontgen. Will submit a PR to overhaul fontgen when I got enough time to work on it.

Aug 29 '19 00:08 cdluminate

This looks fun. Grabbed a bunch of images and put them in ~/coco

$ ./chafa8x8.py CreateDataset --glob "coco/*.jpg"
Traceback (most recent call last):
  File "/media/sd/Projects/TermFun/chafedit/tools/fontgen/./chafa8x8.py", line 15, in <module>
    from sklearn.cluster import KMeans, MiniBatchKMeans
ModuleNotFoundError: No module named 'sklearn'

alright then

$ pip3 install sklearn
Requirement already satisfied: sklearn in /usr/local/lib/python3.10/dist-packages (0.0.post1)
$ pip3 install KMeans
Requirement already satisfied: KMeans in /usr/local/lib/python3.10/dist-packages (1.0.2)
$ pip3 install MiniBatchKMeans
ERROR: Could not find a version that satisfies the requirement MiniBatchKMeans (from versions: none)
ERROR: No matching distribution found for MiniBatchKMeans

Ah yes, python is satan. nevermind.

FWIW for my textart, braille already gives an 2x4 matrix that resolves everything quite well. Particularly if your font uses 'full block' braille glyphs.

Beyond that resolution there are few gains due to two-color limitation.

Jan 10 '23 17:01 clort81

I think one could further develop this code to generate wedge shapes and such. It's been a while since I tried it, though. Maybe the dependencies are out of date (or the required packages are only available on Debian?).

Jan 10 '23 17:01 hpjansson

I think one could further develop this code to generate wedge shapes and such. It's been a while since I tried it, though. Maybe the dependencies are out of date (or the required packages are only available on Debian?).

"It's been a while since I tried it" -- same for me. I thought I could rewrite the code in the past and did not find a good reason to do so due to the good sixel support from some modern terminals. But I still like the idea and it's fun. My code was using the standard libraries commonly seen in the machine learning community (scikit-learn). It's just a little bit tricky for someone not familiar with machine learning packages to discover that import sklearn in fact refers to the scikit-learn package: https://scikit-learn.org/stable/

Maybe I should write an requirements.txt file for dependencies?

Jan 10 '23 18:01 cdluminate

Maybe I should write an requirements.txt file for dependencies?

Oh, that would be great! Or maybe expand its README.md a little?

Jan 10 '23 18:01 hpjansson

Thanks cdluminate! Installing scikit-learn as user worked.

I got this far

>  ./chafa8x8.py CreateDataset --glob ./coco/*.jpg

This gives a long file list to stderr but doesn't create a file.

./chafa8x8.py Clustering
=> loading dataset from chafa8x8.npz
Traceback (most recent call last):
  File "/media/sd/Projects/TermFun/chafa/tools/fontgen/./chafa8x8.py", line 232, in <module>
    eval(f'main{sys.argv[1]}')(sys.argv[2:])
  File "/media/sd/Projects/TermFun/chafa/tools/fontgen/./chafa8x8.py", line 95, in mainClustering
    dataset = np.load(ag.dataset)['dataset']
              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 405, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'chafa8x8.npz'

i see the python has a --save option but using that didn't create a .npz file how do i generate the .npz?

Feb 05 '23 02:02 clort81

Note, don't let your shell expand the wildcard *.jpg. The correct command is as follows in your case

>  ./chafa8x8.py CreateDataset --glob './coco/*.jpg'

The output will look like this

Feb 05 '23 02:02 cdluminate

1290508 Feb 5 03:07 chafa8x8.npz Worked! Sorry for the oversight. Thanks!

Feb 05 '23 02:02 clort81

I'm currently running my code to see whether it can be updated. I'm also updating the README. You will be able to see the updates... maybe within the next 1 hour.

Feb 05 '23 02:02 cdluminate

$ ./chafa8x8.py GenA
 -> number of centers: 4633
=> Result saved to chafa8x8.json
Traceback (most recent call last):
  File "svg2ttf.py", line 4, in <module>
    import fontforge as ff
ImportError: No module named fontforge

Alright logical, you use some fontforge lib to generate the ttf...

$ pip3 install fontforge
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement fontforge (from versions: none)
ERROR: No matching distribution found for fontforge

Well now we're in hell again aren't we...

$ curl https://bootstrap.pypa.io/get-pip.py | python
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2514k  100 2514k    0     0  1249k      0  0:00:02  0:00:02 --:--:-- 1250k
Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-23.0-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 1.3 MB/s eta 0:00:00
Installing collected packages: pip
Successfully installed pip-23.0
$ which pip
/usr/local/bin/pip
$ pip install fontforge
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement fontforge (from versions: none)
ERROR: No matching distribution found for fontforge

Searching web for some solution i see a version can be specified

$ python3 -m pip install --pre --upgrade PACKAGE==VERSION.VERSION.VERSION
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement PACKAGE==VERSION.VERSION.VERSION (from versions: 0.1.1)

Oh is version 0.1.1 the right one?

$ python3 -m pip install --pre --upgrade PACKAGE==0.1.1
Defaulting to user installation because normal site-packages is not writeable
Collecting PACKAGE==0.1.1
  Downloading package-0.1.1.tar.gz (13 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-0tfmb0j0/package_bbae3602879f4652831549991f005883/setup.py", line 4
          print """
          ^^^^^^^^^
      SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

That smells like it's using python 2 for some reason yes?

What fontforge are you using and how do i install it?

What python version does it need? I'm at 3.11

Cheers

Feb 05 '23 02:02 clort81

Please take a look at my latest pull request https://github.com/hpjansson/chafa/pull/128. Specifically:

Note, in order to generate a usable font, the python-fontforge (for older Debian systems) or the python3-fontforge (for Debian bullseye and newer) package has to be installed as well. It will be used in the ./chafa8x8.py GenA step. It will automatically invoke chafa8x8.py GenFont subcommand for creating the font.

I knew that you will encounter issue with Python2 :-)

Feb 05 '23 02:02 cdluminate

Gpu clustering is supported now: https://github.com/hpjansson/chafa/pull/128/commits/03dcc8e8de6c9f6ee199d46341d85db1a754330f

We are able to have fun with large datasets as long as an Nvidia GPU is available.

Feb 05 '23 03:02 cdluminate

I generated a sample font from a small dataset: chafa8x8.zip The file includes the json file and the ttf file.

It contains 4791 glyphs. The unicode range is 0x100000..0x1012b6. But when I try the following command, the outputs are fully black. What did I miss? (I installed the font)

chafa xxx.png --glyph-file /tmp/chafa8x8.ttf --symbols 0x100000..0x1012b6

Feb 05 '23 04:02 cdluminate

I generated another sample font from a large dataset (COCO 2017 validation set): http://images.cocodataset.org/zips/val2017.zip

Steps to reproduce (Debian bullseye):

./chafa8x8.py CreateDataset --glob 'val2017/*.jpg' -Mc 500 . The resulting dataset size is 2.5 million.
./chafa8x8.py Clustering -B faiss . This takes 120.21 seconds on Nvidia RTX 2060 (mobile). It could take more than 12 hours with the sklearn backend on Xeon CPU, IIRC.
./chafa8x8.py GenA

Result: chafa8x8-coco2017val.zip

Feb 05 '23 04:02 cdluminate

@hpjansson Is there any detailed instructions on how to use a custom font? (maybe the manpage description for --glyph-file should be expanded a little bit) I realized that I'm unable to make it work. I was able to use the font with the custom glyph header, but not with the --glyph-file argument. I must have missed something?

Feb 05 '23 05:02 cdluminate

With current master, the easiest way is to use --glyph-file chafa8x8.ttf --symbols imported. But --symbols 0x100000..0x1012b6 should also work. Sometimes it takes a while for the display server to find the font, and some terminals have to be restarted (VTE will find the new font and update itself after a while).

Feb 06 '23 02:02 hpjansson

There's also a hidden option you can use: chafa --dump-glyph-file chafa8x8.ttf will tell you what Chafa thinks the font looks like after internal postprocessing.

Feb 06 '23 02:02 hpjansson

I'm using some VTE-based (tilix, gnome-terminal) terminals and QT-based terminals (konsole, yakuake). It seems that VTE-based terminals require the font to be installed into the system directory /usr/share/fonts/. Currently in my VTE terminals I can correctly see the glyphs during the dump chafa --dump-glyph-file. But when printing an image, the result is still fully black. I've patched the python code to remove the width=0 and vwidth=0 lines..

The 0x101079 is correctly shown... but the image is still not working correctly.

Feb 06 '23 03:02 cdluminate

Strange. I was able to get the font picked up when copied into $HOME/.fonts/.

Feb 06 '23 03:02 hpjansson

Meanwhile, the results of chafa --dump-glyph-file somehow differ from the chafa8x8.h (accurate).

In chafa8x8.h, the first several glyphs are:

{
    /* Chafa8x8 Font, ID: 1, Unicode: 0x100001 */
    CHAFA_SYMBOL_TAG_CUSTOM,
    0x100001,
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "      X "
},
{
    /* Chafa8x8 Font, ID: 2, Unicode: 0x100002 */
    CHAFA_SYMBOL_TAG_CUSTOM,
    0x100002,
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "     X  "
},
{
    /* Chafa8x8 Font, ID: 3, Unicode: 0x100003 */
    CHAFA_SYMBOL_TAG_CUSTOM,
    0x100003,
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "        "
    "    X   "
},

The positions of the three X are (8, 7), (8, 6) and (8, 5). Let's see the dump:

    {
        /* [􀀁] */
        CHAFA_SYMBOL_TAG_,
        0x100001,
        CHAFA_SYMBOL_OUTLINE_8X8 (
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "   X    ")
    },
    {
        /* [􀀂] */
        CHAFA_SYMBOL_TAG_,
        0x100002,
        CHAFA_SYMBOL_OUTLINE_8X8 (
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "   X    ")
    },
    {
        /* [􀀃] */
        CHAFA_SYMBOL_TAG_,
        0x100003,
        CHAFA_SYMBOL_OUTLINE_8X8 (
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "        "
            "   X    ")
    },

The positions are (8, 4), (8, 4), (8, 4).

Feb 06 '23 03:02 cdluminate

The dump for the last several glyphs matches with chafa8x8.h.

Feb 06 '23 03:02 cdluminate