fontations
fontations copied to clipboard
[read-fonts] Speed up ValueFormat size calculation
I see 0.8% in my Roboto HarfRust profile spent in value_record.rs:
/// Return the number of bytes required to store a [`ValueRecord`] in this format.
#[inline]
pub fn record_byte_len(self) -> usize {
self.bits().count_ones() as usize * u16::RAW_BYTE_LEN
}
As it happens, we're only interested in counting ones of a u8 number. In HB I optimized this by writing a custom function instead of relying on the compiler intrinsic:
/* Return the number of 1 bits in a uint8_t; faster than hb_popcount() */
static inline unsigned
hb_popcount8 (uint8_t v)
{
static const uint8_t popcount4[16] = {
0, 1, 1, 2, 1, 2, 2, 3,
1, 2, 2, 3, 2, 3, 3, 4
};
return popcount4[v & 0xF] + popcount4[v >> 4];
}
This is the case because unless you build for your native arch, compilers don't rely on CPU ops for popcount and generate code that takes about 10 ops (branchless). I found that this replacement speeds up measurably and reliably.
Caveat:
// Note: spec says skip 2 bytes per bit in the valueformat. But reports
// from Microsoft developers indicate that only the fields that are
// currently defined are counted. We don't expect any new fields to
// be added to ValueFormat. As such, we use the faster hb_popcount8
// that only processes the lowest 8 bits.