qoi-rust
qoi-rust copied to clipboard
Hash index and use is slower then just encoding
Hi,
I was doing a benchmark of saving thumbnail sized images (think 400x400 max) and was profiling that in ... Turns out of the encoding time quite a sizable portion is spend in encode_impl and within that function this if/else is hot:
hash_prev = px_rgba.hash_index();
let index_px = &mut index[hash_prev as usize];
if *index_px == px_rgba {
buf = buf.write_one(QOI_OP_INDEX | hash_prev)?;
} else {
*index_px = px_rgba;
buf = px.encode_into(px_prev, buf)?;
}
The hash logic (the if) and the calculation of the hash (hash_index) together seem to be more expensive then just not doing hashing at all.
Now we're still talking about something that is fast to just a little faster (from 7.000.000 instructions to ~6.000.000 instructions) so from that point of view it might not even be worth much consideration. However, it also simplifies the code quite a bit to just remove it and it's faster. That should be worth a consideration!
My encode_impl without this looks as follows:
#[allow(clippy::cast_possible_truncation, unused_assignments, unused_variables)]
fn encode_impl<W: Writer, const N: usize>(mut buf: W, data: &[u8]) -> Result<usize>
where
Pixel<N>: SupportedChannels,
[u8; N]: Pod,
{
let cap = buf.capacity();
let mut px_prev = Pixel::new().with_a(0xff);
let mut run = 0_u8;
let mut px = Pixel::<N>::new().with_a(0xff);
let n_pixels = data.len() / N;
for (i, chunk) in data.chunks_exact(N).enumerate() {
px.read(chunk);
if px == px_prev {
run += 1;
if run == 62 || unlikely(i == n_pixels - 1) {
buf = buf.write_one(QOI_OP_RUN | (run - 1))?;
run = 0;
}
} else {
if run != 0 {
buf = buf.write_one(QOI_OP_RUN | (run - 1))?;
run = 0;
}
let px_rgba = px.as_rgba(0xff);
buf = px.encode_into(px_prev, buf)?;
px_prev = px;
}
}
buf = buf.write_many(&QOI_PADDING)?;
Ok(cap.saturating_sub(buf.capacity()))
}