qoi-rust Hash index and use is slower then just encoding

Hash index and use is slower then just encoding

Open markg85 opened this issue 11 months ago • 0 comments

Hi,

I was doing a benchmark of saving thumbnail sized images (think 400x400 max) and was profiling that in ... Turns out of the encoding time quite a sizable portion is spend in encode_impl and within that function this if/else is hot:

            hash_prev = px_rgba.hash_index();
            let index_px = &mut index[hash_prev as usize];
            if *index_px == px_rgba {
                buf = buf.write_one(QOI_OP_INDEX | hash_prev)?;
            } else {
                *index_px = px_rgba;
                buf = px.encode_into(px_prev, buf)?;
            }

The hash logic (the if) and the calculation of the hash (hash_index) together seem to be more expensive then just not doing hashing at all.

Now we're still talking about something that is fast to just a little faster (from 7.000.000 instructions to ~6.000.000 instructions) so from that point of view it might not even be worth much consideration. However, it also simplifies the code quite a bit to just remove it and it's faster. That should be worth a consideration!

My encode_impl without this looks as follows:

#[allow(clippy::cast_possible_truncation, unused_assignments, unused_variables)]
fn encode_impl<W: Writer, const N: usize>(mut buf: W, data: &[u8]) -> Result<usize>
where
    Pixel<N>: SupportedChannels,
    [u8; N]: Pod,
{
    let cap = buf.capacity();

    let mut px_prev = Pixel::new().with_a(0xff);
    let mut run = 0_u8;
    let mut px = Pixel::<N>::new().with_a(0xff);
    let n_pixels = data.len() / N;

    for (i, chunk) in data.chunks_exact(N).enumerate() {
        px.read(chunk);
        if px == px_prev {
            run += 1;
            if run == 62 || unlikely(i == n_pixels - 1) {
                buf = buf.write_one(QOI_OP_RUN | (run - 1))?;
                run = 0;
            }
        } else {
            if run != 0 {
                buf = buf.write_one(QOI_OP_RUN | (run - 1))?;
                run = 0;
            }
            let px_rgba = px.as_rgba(0xff);
            buf = px.encode_into(px_prev, buf)?;
            px_prev = px;
        }
    }

    buf = buf.write_many(&QOI_PADDING)?;
    Ok(cap.saturating_sub(buf.capacity()))
}

Dec 21 '24 17:12 markg85

qoi-rust qoi-rust copied to clipboard

Hash index and use is slower then just encoding

qoi-rust
qoi-rust copied to clipboard