tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Decode regression

Open daulet opened this issue 1 year ago • 13 comments

This is not a recent regression, and perhaps it won't be fixed for that reason, but I thought I'd file it anyway.

I maintain Go bindings for this library, and by sheer luck I had benchmarks when I started. At some point I've noticed a regression in decoding, but only now got around to investigating it. Long story short, I've bisected this repo and root caused it to this PR. Below is a benchmark used to find it.

Regression details:

decode                  time:   [3.9277 µs 3.9409 µs 3.9558 µs]
                        change: [+241.37% +242.64% +244.06%] (p = 0.00 < 0.05)
                        Performance has regressed.

While decode is pretty fast (order of microseconds), +240% slowdown is fairly big and I wonder if we can gain back that performance.

Benchmark code (tokenizers/benches/decode_benchmark.rs):

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use tokenizers::tokenizer::Tokenizer;

fn decode(tokenizer:&Tokenizer, ids_slice: Vec<u32>, skip_special_tokens: bool) -> String {
    tokenizer.decode(ids_slice, skip_special_tokens).expect("failed to decode input")
}

fn criterion_benchmark(c: &mut Criterion) {
    let tokenizer = Tokenizer::from_file("./test/data/bert-base-uncased.json").expect("failed to create tokenizer");
    c.bench_function("decode", 
    |b| b.iter(
        || decode(&tokenizer, black_box([2829, 4419, 14523, 2058, 1996, 13971, 3899].to_vec()), black_box(true))));
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Add this to Cargo.toml and run with cargo bench decode.

[[bench]]
name = "decode_benchmark"
harness = false

The tokenizer file is copied from here.

daulet avatar Jul 10 '24 13:07 daulet

Ow wow, thanks a mile for the clear report! Yeah 240% is not really good! Okay, I'lll investigate but in the mean time if you have a PR with a fix, would be super good!

ArthurZucker avatar Jul 12 '24 13:07 ArthurZucker

I'll add this to benches! 🆙

ArthurZucker avatar Jul 12 '24 13:07 ArthurZucker

I checked ou the commit you mention, had to create a custom branch to test: test-old-decode, cherry picked the test that I added in fast-decode-fix has just the benchmark, will post the results here!

ArthurZucker avatar Jul 12 '24 14:07 ArthurZucker

image for now seems like the issue was fixed in between

ArthurZucker avatar Jul 12 '24 14:07 ArthurZucker

for now seems like the issue was fixed in between

you mean you don't see a regression at head? I just synced up to confirm performance drop. Don't look at change % as that seems to report changes since last run of the benchmark, just compare absolute values, e.g. here is what I got on two consecutive runs on head:

decode                  time:   [3.8542 µs 3.8795 µs 3.9159 µs]
                        change: [+234.61% +236.83% +240.08%] (p = 0.00 < 0.05)
                        Performance has regressed.
decode                  time:   [3.7643 µs 3.7762 µs 3.7891 µs]
                        change: [-3.8222% -2.8277% -2.1576%] (p = 0.00 < 0.05)
                        Performance has improved.

daulet avatar Jul 13 '24 05:07 daulet

I'll check again! I was getting 630ns for both branches but let me make sure of that!

ArthurZucker avatar Jul 13 '24 09:07 ArthurZucker

oh, i just checked test-old-code branch, and it contains the commit (#938) that introduced the regression. You want to branch one commit earlier than that :)

daulet avatar Jul 13 '24 13:07 daulet

Yeah I cherry picked it I think, to run before / after I'll checka gain

ArthurZucker avatar Jul 16 '24 11:07 ArthurZucker

Since encode perf was improved in 0.20 release, any plans to look at this?

daulet avatar Aug 09 '24 22:08 daulet

Yep for sure 😉 Ad you've seen was more focused on encode but will pick this back up

ArthurZucker avatar Aug 12 '24 05:08 ArthurZucker

@ArthurZucker any updates? :)

daulet avatar Nov 05 '24 18:11 daulet

@ArthurZucker You were not able to reproduce the regression because the tokenizer in the test-old-code branch is missing a decoder and is not the original one supplied in this issue. I can reproduce the slowdown, but short of changing the API to one maybe based on iterators, it doesn't seem like there is a simple fix. The decoder API changed to one returning a Vec<String> instead of String so some regression is expected.

With the changes from #1707 the regression is a bit lower, +210%.

sftse avatar Dec 30 '24 13:12 sftse

Yep, sorry I don't really have much BW to go back to this now! Super sorry for not answering sooner. Again, if anyone wants to have a look, would help!

ArthurZucker avatar Jan 16 '25 16:01 ArthurZucker