strsim-rs icon indicating copy to clipboard operation
strsim-rs copied to clipboard

avoid string copy in sorensen dice

Open maxbachmann opened this issue 6 months ago • 1 comments

Since we only need to iterate over the bigrams for each string once, we can create them lazily instead of collecting them into a string. This reduces the binary size by around 7%. In addition it reduces runtime in our current benchmark by around 11%.

For reference in my example binary this gives:

File  .text     Size Crate
6.0%  96.7% 275.8KiB std
0.1%   1.5%   4.3KiB strsim
0.0%   0.0%     124B rf_test
0.0%   0.0%     102B [Unknown]
6.3% 100.0% 285.3KiB .text section size, the file size is 4.5MiB

while previously it was:

File  .text     Size Crate
6.1%  96.6% 276.5KiB std
0.1%   1.6%   4.6KiB strsim
0.0%   0.0%     124B rf_test
0.0%   0.0%     102B [Unknown]
6.3% 100.0% 286.3KiB .text section size, the file size is 4.5MiB

maxbachmann avatar Jan 04 '24 11:01 maxbachmann