[codegen] capture initial fixed fields in a struct
This is a fairly large update to codegen. It is technically breaking but will only affect users that reference the TableRef type explicitly.
The change basically captures the sequence of "fixed" fields for each table into a separate zerocopy struct (fixed is defined as always present and existing at a constant offset). At read time, we parse a reference to this struct and store it as a new field on TableRef. Any subsequent access to the fields from the fixed set simply reads from that reference.
Note that we still continue to validate the trailing conditional/variable size fields at this point. We may want to reconsider that as a next step.
Presuming this has been validated by profiling, it sounds good from a high level.
Thinking it through, though, I wonder if it would make more sense to put these fields into the relevant 'marker' structure, instead of adding a new field? My sense is that this should also let us remove fields from the marker struct, which we should do if we're able.
Quite a while ago I started to wonder if the stuff stored in the marker struct itself really made sense; that is, is it actually a win to precompute the various offsets that are stored there, instead of just computing them on the fly? especially if a bunch of that computation will just involve accessing values on the struct..
These fields can't be added directly to the marker struct because we're reading the entire thing by reference so it needs to match the layout of the underlying font data.
I've thought a lot about the trailing fields and the offsets stored in the marker struct. One of the issues is that if we don't compute these then all reads following the fixed field set become fallible. If we're going to validate all the fields but not store the offsets then that's just wasted work. I also think subsetting depends on some of this stuff now.
On HR benchmarks, this alone seems to buy about 4-5% which is worthwhile.
I'm not seeing a speedup, but ~3% slowdown. I ran it multiple times. Humm.
Comparing before to after
Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-thelittleprince.txt/harfrust +0.0196 +0.0195 189 193 188 192
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-words.txt/harfrust +0.0163 +0.0182 214 218 213 217
BM_Shape/Amiri-Regular.ttf/fa-thelittleprince.txt/harfrust +0.0344 +0.0351 68 70 67 70
BM_Shape/NotoSansDevanagari-Regular.ttf/hi-words.txt/harfrust +0.0502 +0.0506 42 44 41 44
BM_Shape/Roboto-Regular.ttf/en-thelittleprince.txt/harfrust +0.0416 +0.0426 21 22 21 22
BM_Shape/Roboto-Regular.ttf/en-words.txt/harfrust +0.0375 +0.0383 29 30 29 30
BM_Shape/SourceSerifVariable-Roman.ttf/react-dom.txt/harfrust +0.0070 +0.0067 237 238 236 237
OVERALL_GEOMEAN +0.0294 +0.0300 0 0 0 0
So I've managed to set up some criterion benchmarks on HarfRust (will PR soon) and this is what I'm getting at the moment for this branch vs 0.31.1:
fonts/Roboto-Regular.ttf/texts/en-thelittleprince.txt
time: [20.352 ms 20.363 ms 20.375 ms]
change: [−2.4867% −2.4041% −2.3165%] (p = 0.00 < 0.05)
Performance has improved.
fonts/Amiri-Regular.ttf/texts/fa-thelittleprince.txt
time: [56.631 ms 56.716 ms 56.801 ms]
change: [−5.9951% −5.8422% −5.6772%] (p = 0.00 < 0.05)
Performance has improved.
I'm guessing further investigation is needed.
Is there an increased setup cost per shape call? I wonder if https://github.com/harfbuzz/harfrust/issues/115 is relevant.
ah, yes, that might be a relevant difference. I don't think this PR should increase the setup cost though.
So, after this landed: https://github.com/harfbuzz/harfrust/pull/133 I actually see this PR speed up our Arabic fonts a good deal (5 to 9%), but slow down Roboto 2 / 3%. I'm in favor of merging and continuing optimizations.
Comparing before to after
Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-thelittleprince.txt/harfrust -0.0816 -0.0811 126 116 125 115
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-words.txt/harfrust -0.0949 -0.0950 143 129 142 129
BM_Shape/Amiri-Regular.ttf/fa-thelittleprince.txt/harfrust -0.0503 -0.0497 54 51 53 51
BM_Shape/NotoSansDevanagari-Regular.ttf/hi-words.txt/harfrust +0.0148 +0.0145 33 33 33 33
BM_Shape/Roboto-Regular.ttf/en-thelittleprince.txt/harfrust +0.0215 +0.0224 20 20 20 20
BM_Shape/Roboto-Regular.ttf/en-words.txt/harfrust +0.0375 +0.0372 26 27 26 27
BM_Shape/SourceSerifVariable-Roman.ttf/react-dom.txt/harfrust +0.0047 +0.0050 215 216 214 215
OVERALL_GEOMEAN -0.0225 -0.0222 0 0 0 0
So this one is bothering me because the performance should be neutral at worst but we're seeing some slowdowns. My current guess is that we're increasing the size of every TableRef by 8 bytes to store the additional pointer and this is increasing stack and register pressure leading to worse codegen.
Additionally, data.as_ptr() is always equal to fixed_fields.as_ptr() so storing the extra pointer is entirely unnecessary but impossible to avoid in safe Rust. What I'm going to try is adding a new type to font-types that uses unsafe to coalesce these and we'll see if that improves performance.
Also, we store 64 bit offsets in the marker structs but table sizes are limited to 32 bits so we can potentially cut the size of those in half as well. I'll look into that later.