deno_core
deno_core copied to clipboard
op2: Investigate `#[string]` performance for LARGE_1000000
op2
strings are faster in every case other than LARGE_1000000 (1,000,000 ASCII characters). We need to investigate why.
test baseline ... bench: 878 ns/iter (+/- 96)
test bench_op_option_u32 ... bench: 49,690 ns/iter (+/- 22,588)
test bench_op_string ... bench: 18,925 ns/iter (+/- 2,256)
test bench_op_string_large_1000 ... bench: 297,396 ns/iter (+/- 41,330)
test bench_op_string_large_1000000 ... bench: 2,622,869 ns/iter (+/- 298,615)
test bench_op_string_large_utf8_1000 ... bench: 3,946,605 ns/iter (+/- 403,230)
test bench_op_string_large_utf8_1000000 ... bench: 38,985,146 ns/iter (+/- 2,266,213)
test bench_op_string_old ... bench: 19,870 ns/iter (+/- 2,354)
test bench_op_string_old_large_1000 ... bench: 246,036 ns/iter (+/- 40,192)
test bench_op_string_old_large_1000000 ... bench: 1,082,275 ns/iter (+/- 104,487)
test bench_op_string_old_large_utf8_1000 ... bench: 5,485,882 ns/iter (+/- 489,366)
test bench_op_string_old_large_utf8_1000000 ... bench: 51,652,968 ns/iter (+/- 3,158,678)
test bench_op_string_option_u32 ... bench: 82,449 ns/iter (+/- 10,669)
test bench_op_u32 ... bench: 4,508 ns/iter (+/- 575)
test bench_op_void ... bench: 5,054 ns/iter (+/- 419)
@mmastrac It seems this has been fixed in main
?
test bench_op_string_large_utf8_1000000 ... bench: 15,772,187 ns/iter (+/- 462,927)
...
test bench_op_string_old_large_utf8_1000000 ... bench: 20,796,803 ns/iter (+/- 354,834)
The UTF8 one is faster with op2, but for some reason the ASCII one is not. I think the benchmark has improved on main
but is still slower (I think ~50%?).
Trimmed recent benchmark:
test bench_op_string_large_1000000 ... bench: 790,843 ns/iter (+/- 28,126)
test bench_op_string_old_large_1000000 ... bench: 471,671 ns/iter (+/- 70,252)
I wonder if we're just falling off some SIMD/autovectorization fast path?
Ah ok, here's the profile for each one:
bench_op_string_large_1000000
- https://share.firefox.dev/44fuKRX
bench_op_string_old_large_1000000
- https://share.firefox.dev/44WKIBf
It seems the fast call path is not taken in either of the cases and the other difference is that the old one uses WriteUtf8 whereas we use WriteOneByte