FreeTypeAbstraction.jl icon indicating copy to clipboard operation
FreeTypeAbstraction.jl copied to clipboard

Better font-finding heuristics, with a shortcut and a few caches.

Open tecosaur opened this issue 1 year ago • 8 comments

I initially started looking at this package when I found that it doesn't always look in the right directories for fonts (see #82). However, I recently found that the heuristic for font finding itself is both slow (#67) and a bit dodgy (#83).

So, I've gone through the font-finding code and made a few changes:

  • The scoring heuristic takes a few more factors into account
    • Earlier components of a search string are now weighted slightly higher (e.g. the plex in ibm plex sans italic is worth more than sans)
    • Particular regular styles will be picked over others (e.g. regular over medium)
    • Certain font formats are prioritised (e.g. otf over pfb)
  • ~~Introduced a considered shortcut when scoring each font file~~
    • ~~The list of matching font-file is pre-sorted according to matches of the search string in the font file name~~
    • ~~We calculate the maximum possible family and style score given the search string~~
    • ~~We return the current best font early when:~~
      • ~~We have found a font that maximum score~~
      • ~~We have seen more than twice as many fonts as the last font with the maximum score seen~~
  • Introduced a few (runtime) caches
    • A cache of the font file names, without the extension, as lowercase
    • A cache of the family, style, and extension of font files
    • A cache of the resolved font for given searches (invalidated by directory modification times)

From the test that I've performed locally, this batch of changes results in faster, better initial lookups, and faster (again) subsequent lookups.


Here are some test results on my machine:

Search string Current time PR time Current result (family, style) PR result
ibm plex sans bold italic 1.5s 0.06s AlegreyaSans-BoldItalic, Bold IBM Plex Sans, Bold Italic
ibm sans 1.5s 0.04s IBM Plex Sans, Regular (same)
sans 1.5s 0.02s KpSans, Regular DejaVu Sans, Book
hack 1.5s 0.02s Hack, Regular (same)
computer modern 1.5s 0.03s Computer Modern, Roman Computer Modern, Medium
schoolbook 1.5s 0.03s Century Schoolbook L, Roman Adobe New Century Schoolbook, Medium
medium euler 1.5s 0.08s Alegreya-Medium, Medium Euler, Medium
bold slanted roman 1.5s 0.04s Latin Modern Roman Slanted, 10 Bold (same)

Subsequent identical searches take ~0.0002s.

If more people could perform adversarial (but not pathological) tests with this scheme, that would be much appreciated.

tecosaur avatar Jan 31 '24 11:01 tecosaur

There was some concern raised about using the file-sorting based shortcut. I've just had a look at instead baking the fontfile sort-info into the precompilation step, and that seems to work a treat.

tecosaur avatar Jan 31 '24 11:01 tecosaur

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (077003e) 95.26% compared to head (3ce7edf) 95.53%.

Files Patch % Lines
src/findfonts.jl 95.74% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   95.26%   95.53%   +0.26%     
==========================================
  Files           6        6              
  Lines         317      336      +19     
==========================================
+ Hits          302      321      +19     
  Misses         15       15              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 31 '24 11:01 codecov[bot]

If we're caching paths at precompilation time, that will pose issues for relocatability I think.. Maybe this should not be done at that point but at init. It could be made async so that it doesn't delay package loading, but then some locking mechanism would need to be in place for usage.

jkrumbiegel avatar Jan 31 '24 20:01 jkrumbiegel

If we're caching paths at precompilation time, that will pose issues for relocatability I think...

Relocatability does make this interesting, but I think we need to be clearer on what the potential "issues" might be. In this case, the primary consequence of a precompiled value that doesn't match the actual system is stale cache entries which will be replaced/ignored at runtime. I.e. this doesn't affect correctness, but might affect some performance considerations.

Do you know of any precompile result relocation happening in practice? It could help to have a more concrete example to discuss this in the context of.

tecosaur avatar Jan 31 '24 21:01 tecosaur

Yes the issue would be that the baked in fonts in the dict probably do not transfer to another system. So you wouldn't gain much speed there.

Do you know of any precompile result relocation happening in practice? It could help to have a more concrete example to discuss this in the context of.

At work, we use sysimages compiled on CI runners and transferred to other runners. There we run into relocatability issues all the time.

jkrumbiegel avatar Feb 01 '24 07:02 jkrumbiegel

Right. So it sounds like what we really want is a per-system font cache file?

Would Scatch.jl be the way to go then?

tecosaur avatar Feb 01 '24 07:02 tecosaur

@jkrumbiegel any further thoughts on this?

tecosaur avatar Feb 07 '24 03:02 tecosaur

Not really further, I also think using a scratch space might be the reasonable way to go. Store a file with all the names and paths there and only repopulate the info when files change.

jkrumbiegel avatar Feb 07 '24 16:02 jkrumbiegel