fontations icon indicating copy to clipboard operation
fontations copied to clipboard

[codegen] capture initial fixed fields in a struct

Open dfrg opened this issue 4 months ago • 9 comments

This is a fairly large update to codegen. It is technically breaking but will only affect users that reference the TableRef type explicitly.

The change basically captures the sequence of "fixed" fields for each table into a separate zerocopy struct (fixed is defined as always present and existing at a constant offset). At read time, we parse a reference to this struct and store it as a new field on TableRef. Any subsequent access to the fields from the fixed set simply reads from that reference.

Note that we still continue to validate the trailing conditional/variable size fields at this point. We may want to reconsider that as a next step.

dfrg avatar Aug 06 '25 15:08 dfrg

Presuming this has been validated by profiling, it sounds good from a high level.

Thinking it through, though, I wonder if it would make more sense to put these fields into the relevant 'marker' structure, instead of adding a new field? My sense is that this should also let us remove fields from the marker struct, which we should do if we're able.

Quite a while ago I started to wonder if the stuff stored in the marker struct itself really made sense; that is, is it actually a win to precompute the various offsets that are stored there, instead of just computing them on the fly? especially if a bunch of that computation will just involve accessing values on the struct..

cmyr avatar Aug 06 '25 17:08 cmyr

These fields can't be added directly to the marker struct because we're reading the entire thing by reference so it needs to match the layout of the underlying font data.

I've thought a lot about the trailing fields and the offsets stored in the marker struct. One of the issues is that if we don't compute these then all reads following the fixed field set become fallible. If we're going to validate all the fields but not store the offsets then that's just wasted work. I also think subsetting depends on some of this stuff now.

On HR benchmarks, this alone seems to buy about 4-5% which is worthwhile.

dfrg avatar Aug 06 '25 17:08 dfrg

I'm not seeing a speedup, but ~3% slowdown. I ran it multiple times. Humm.

Comparing before to after
Benchmark                                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-thelittleprince.txt/harfrust                +0.0196         +0.0195           189           193           188           192
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-words.txt/harfrust                          +0.0163         +0.0182           214           218           213           217
BM_Shape/Amiri-Regular.ttf/fa-thelittleprince.txt/harfrust                           +0.0344         +0.0351            68            70            67            70
BM_Shape/NotoSansDevanagari-Regular.ttf/hi-words.txt/harfrust                        +0.0502         +0.0506            42            44            41            44
BM_Shape/Roboto-Regular.ttf/en-thelittleprince.txt/harfrust                          +0.0416         +0.0426            21            22            21            22
BM_Shape/Roboto-Regular.ttf/en-words.txt/harfrust                                    +0.0375         +0.0383            29            30            29            30
BM_Shape/SourceSerifVariable-Roman.ttf/react-dom.txt/harfrust                        +0.0070         +0.0067           237           238           236           237
OVERALL_GEOMEAN                                                                      +0.0294         +0.0300             0             0             0             0

behdad avatar Aug 06 '25 18:08 behdad

So I've managed to set up some criterion benchmarks on HarfRust (will PR soon) and this is what I'm getting at the moment for this branch vs 0.31.1:

fonts/Roboto-Regular.ttf/texts/en-thelittleprince.txt
                        time:   [20.352 ms 20.363 ms 20.375 ms]
                        change: [−2.4867% −2.4041% −2.3165%] (p = 0.00 < 0.05)
                        Performance has improved.

fonts/Amiri-Regular.ttf/texts/fa-thelittleprince.txt
                        time:   [56.631 ms 56.716 ms 56.801 ms]
                        change: [−5.9951% −5.8422% −5.6772%] (p = 0.00 < 0.05)
                        Performance has improved.

I'm guessing further investigation is needed.

dfrg avatar Aug 06 '25 22:08 dfrg

Is there an increased setup cost per shape call? I wonder if https://github.com/harfbuzz/harfrust/issues/115 is relevant.

behdad avatar Aug 06 '25 22:08 behdad

ah, yes, that might be a relevant difference. I don't think this PR should increase the setup cost though.

dfrg avatar Aug 06 '25 22:08 dfrg

So, after this landed: https://github.com/harfbuzz/harfrust/pull/133 I actually see this PR speed up our Arabic fonts a good deal (5 to 9%), but slow down Roboto 2 / 3%. I'm in favor of merging and continuing optimizations.

Comparing before to after
Benchmark                                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-thelittleprince.txt/harfrust                -0.0816         -0.0811           126           116           125           115
BM_Shape/NotoNastaliqUrdu-Regular.ttf/fa-words.txt/harfrust                          -0.0949         -0.0950           143           129           142           129
BM_Shape/Amiri-Regular.ttf/fa-thelittleprince.txt/harfrust                           -0.0503         -0.0497            54            51            53            51
BM_Shape/NotoSansDevanagari-Regular.ttf/hi-words.txt/harfrust                        +0.0148         +0.0145            33            33            33            33
BM_Shape/Roboto-Regular.ttf/en-thelittleprince.txt/harfrust                          +0.0215         +0.0224            20            20            20            20
BM_Shape/Roboto-Regular.ttf/en-words.txt/harfrust                                    +0.0375         +0.0372            26            27            26            27
BM_Shape/SourceSerifVariable-Roman.ttf/react-dom.txt/harfrust                        +0.0047         +0.0050           215           216           214           215
OVERALL_GEOMEAN                                                                      -0.0225         -0.0222             0             0             0             0

behdad avatar Aug 12 '25 01:08 behdad

So this one is bothering me because the performance should be neutral at worst but we're seeing some slowdowns. My current guess is that we're increasing the size of every TableRef by 8 bytes to store the additional pointer and this is increasing stack and register pressure leading to worse codegen.

Additionally, data.as_ptr() is always equal to fixed_fields.as_ptr() so storing the extra pointer is entirely unnecessary but impossible to avoid in safe Rust. What I'm going to try is adding a new type to font-types that uses unsafe to coalesce these and we'll see if that improves performance.

dfrg avatar Aug 12 '25 13:08 dfrg

Also, we store 64 bit offsets in the marker structs but table sizes are limited to 32 bits so we can potentially cut the size of those in half as well. I'll look into that later.

dfrg avatar Aug 12 '25 13:08 dfrg