opentype.js icon indicating copy to clipboard operation
opentype.js copied to clipboard

Don't load `hmtx` table until it's actually needed

Open Pomax opened this issue 10 years ago • 8 comments

The hmtx table loads all glyph advanceWidth and leftSideBearing values, which is fine for small fonts but for large fonts this puts a lot of data in memory.

Instead of loading all data during hmtx.parseHmtxTable, called from opentype.js:parseBuffer, it would be much nicer to only load in a glyph's values the first time those values are actually needed. The glyph prototype could be assigned default advanceWidth and leftSideBearing values with getter/setters, such that when they are requested, they then consult HMTX and bind the values onto the glyph instance, which then shadows the prototype values:

Object.define( Glyph.prototype, 'advanceWidth', {
  configurable: true,
  get: function() { this.setMetricsFromHMTX(); return this.advanceWidth; },
  set: function(v) { this.advanceWidth = v; }
});

Before any values are bound, the prototype will be hit, but once the instance has these values bound, it won't hit the prototype anymore. A demonstrator of the concept: http://jsbin.com/zopigokeho/edit?js

Pomax avatar Jun 21 '15 19:06 Pomax

I like this.

fdb avatar Jun 22 '15 08:06 fdb

Could be nice, just be aware that putting a getter on a heavily used property on every glyph could affect performance.

fpirsch avatar Sep 26 '15 10:09 fpirsch

Depends on how poorly you implement it. An array backed object that grows itself as it's consulted is perfectly fine. The initial hit will be as slow as the data traversal has to be to get the correct metric value, but after that it's the same speed as accessing any other LUT.

(and finding the correct metric is pretty fast, given the layout of the bytecode for opentype fonts)

I already did this for glyphs in https://github.com/nodebox/opentype.js/pull/131, the idea was to simply port the same approach to hmtx. Unfortunately, I've not been able to get to this for a long time.

Pomax avatar Sep 26 '15 18:09 Pomax

I was just talking about simple property access vs a getter function call. Benchmarking the rendering of lots of glyphs with and without such a getter would be great.

fpirsch avatar Sep 26 '15 19:09 fpirsch

The nice thing about a getter is that things are still JS, so with some clever code you can actually overwrite a getter with the static binding once you have all the data in. But, agreed! Benchmarks are always good.

Pomax avatar Sep 27 '15 05:09 Pomax

What we are saying here is that there are two use cases for a font:

  • general font information, which should parse only parts of the file
  • glyph rendering, which requires parsing the full font file This could also be implemented with options passed to the opentype.parse() method, or a new method like e.g. opentype.parseHeaders().

fpirsch avatar Sep 27 '15 10:09 fpirsch

Actually, I think things are a bit more complex than just the two cases. If I'm only interested in drawing properly kerned text, there's a lot of stuff that gets loaded in memory that I don't care to retain. I can imagine a "low-memory mode" wherein we're just trying to "carve" through the values as fast as possible, like cutting through the jungle with a machete. (e.g. in head parse just unitsPerEm and indexToLocFormat; in maxp parse just numGlyphs, ...) This was an approach I saw in pdf.js, and I really like it. Also, Freetype does something similar where it just leaves values in the buffer unparsed and looks them up when needed (e.g. cmap).

I'm not sure if all of these approaches can be consolidated in one library, but I think it would be interesting to have a "font info" and "full" mode, at least.

fdb avatar Sep 27 '15 19:09 fdb

Actually, glyph rendering definitely should not require parsing the full font file (no native font engine will do this, either). Loading entire fonts into memory will actually blow up memory instantly for large fonts, like the CJK fonts that implement 10,000 to 30,000 glyphs (which is pretty much any CJK font), but will even have noticeable detrimental behaviour for fonts like Times New Roman (3400 glyphs), Arial (3500 glyphs), or an open source font like FreeSans (5300 glyphs) or DejaVu (6000 glyphs)

The absolute last thing a font parser needs to do there is blindly load everything into memory, regardless of whether any of the glyphs and associated metadata are going to end up being used. Instead, they should have a fast traversal route for the font's byte layout (which the spec is optimized for) to look up the data necessary for shaping code point sequences, and only cache things as they are being used, so that you have the lowest possible memory footprint, with the fastest possible shaping.

Uniscribe, DirectWrite, Freetype2, Harfbuzz, etc. all make sure to do this, because running through a font's byte layout even "from disk" rather than memory mapped is fast (in may cases literally just following values as relative pointers), and makes it possible to do shaping with fonts that don't even fit in memory.

JavaScript doesn't have the luxury of running "from file", so it has to create a memory map, and usually needs a prototype that operates on those maps, but creating a fully parsed object in memory uses up way more memory than is desirable: on mobile devices, but also desktop browsers and even servers where "more memory" costs "more money", keeping memory use down is incredibly important: you don't want a tab that uses 200MB just for a single font for which most glyphs aren't actually ever going to be accessed, or so infrequently that caching them makes no sense.

Practical example: I couldn't use opentype.js to generate some "multiple typeface images of Japanese ideographs) using 9 different CJK fonts, exactly because it was loading everything into memory instead of only selectively what was getting used. Instead of using maybe twice the size of the font file in memory, the object representation created by JavaScript used 2GB+ of memory (after the glyph lookup/caching PR, that's gone down to about 320MB. Still a lot, but more manageable by far)

Pomax avatar Sep 27 '15 21:09 Pomax