word_smith icon indicating copy to clipboard operation
word_smith copied to clipboard

Modernize and update for latest Elixir.

Open warmwaffles opened this issue 11 months ago • 0 comments

  • Updates the unaccent.rules
  • Replaces Benchfella with Benchee (interesting results below)
  • Handle new unaccent rules that don't have a replacement, but a full on removal

Squish

It seems the performance gains over the built in regex usage is negligible and can probably be swapped back.

Operating System: Linux
CPU Information: AMD Ryzen 9 5900X 12-Core Processor
Number of Available Cores: 24
Available memory: 125.72 GB
Elixir 1.18.2
Erlang 27.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 5 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 30 bytes, 300 bytes, 3000 bytes, 30000 bytes, 300000 bytes
Estimated total run time: 1 min 40 s

Benchmarking regex with input 30 bytes ...
Benchmarking regex with input 300 bytes ...
Benchmarking regex with input 3000 bytes ...
Benchmarking regex with input 30000 bytes ...
Benchmarking regex with input 300000 bytes ...
Benchmarking squish with input 30 bytes ...
Benchmarking squish with input 300 bytes ...
Benchmarking squish with input 3000 bytes ...
Benchmarking squish with input 30000 bytes ...
Benchmarking squish with input 300000 bytes ...
Calculating statistics...
Formatting results...

##### With input 30 bytes #####
Name             ips        average  deviation         median         99th %
squish        2.13 M        0.47 μs  ±4226.03%        0.43 μs        0.61 μs
regex         0.52 M        1.91 μs   ±621.34%        1.84 μs        2.09 μs

Comparison:
squish        2.13 M
regex         0.52 M - 4.08x slower +1.44 μs

##### With input 300 bytes #####
Name             ips        average  deviation         median         99th %
squish       88.30 K       11.33 μs    ±35.02%       11.14 μs       13.24 μs
regex        87.72 K       11.40 μs    ±35.39%       11.16 μs       13.78 μs

Comparison:
squish       88.30 K
regex        87.72 K - 1.01x slower +0.0750 μs

##### With input 3000 bytes #####
Name             ips        average  deviation         median         99th %
regex         9.56 K      104.60 μs     ±4.27%      104.34 μs      111.23 μs
squish        9.48 K      105.44 μs     ±4.31%      105.30 μs      111.78 μs

Comparison:
regex         9.56 K
squish        9.48 K - 1.01x slower +0.83 μs

##### With input 30000 bytes #####
Name             ips        average  deviation         median         99th %
regex         964.17        1.04 ms     ±1.52%        1.03 ms        1.08 ms
squish        959.31        1.04 ms     ±0.88%        1.04 ms        1.06 ms

Comparison:
regex         964.17
squish        959.31 - 1.01x slower +0.00526 ms

##### With input 300000 bytes #####
Name             ips        average  deviation         median         99th %
regex          77.60       12.89 ms     ±7.27%       12.84 ms       15.29 ms
squish         76.30       13.11 ms     ±7.89%       13.07 ms       15.73 ms

Comparison:
regex          77.60
squish         76.30 - 1.02x slower +0.22 ms

Remove Accents

Because the unaccents is such a huge list now, this probably needs some rethinking maybe. Unsure just yet.

Operating System: Linux
CPU Information: AMD Ryzen 9 5900X 12-Core Processor
Number of Available Cores: 24
Available memory: 125.72 GB
Elixir 1.18.2
Erlang 27.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 5 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 24 bytes, 240 bytes, 2400 bytes, 24000 bytes, 240000 bytes, 2400000 bytes
Estimated total run time: 1 min

Benchmarking remove_accents with input 24 bytes ...
Benchmarking remove_accents with input 240 bytes ...
Benchmarking remove_accents with input 2400 bytes ...
Benchmarking remove_accents with input 24000 bytes ...
Benchmarking remove_accents with input 240000 bytes ...
Benchmarking remove_accents with input 2400000 bytes ...
Calculating statistics...
Formatting results...

##### With input 24 bytes #####
Name                     ips        average  deviation         median         99th %
remove_accents        4.02 M      248.76 ns  ±7741.29%         211 ns         390 ns

##### With input 240 bytes #####
Name                     ips        average  deviation         median         99th %
remove_accents      527.33 K        1.90 μs   ±760.80%        1.67 μs        2.86 μs

##### With input 2400 bytes #####
Name                     ips        average  deviation         median         99th %
remove_accents       64.49 K       15.51 μs    ±37.02%       14.42 μs       22.03 μs

##### With input 24000 bytes #####
Name                     ips        average  deviation         median         99th %
remove_accents        5.57 K      179.41 μs     ±5.74%      177.60 μs      211.60 μs

##### With input 240000 bytes #####
Name                     ips        average  deviation         median         99th %
remove_accents        348.19        2.87 ms    ±22.63%        2.96 ms        4.12 ms

##### With input 2400000 bytes #####
Name                     ips        average  deviation         median         99th %
remove_accents         23.96       41.74 ms    ±31.20%       37.05 ms       96.72 ms

warmwaffles avatar Jan 25 '25 00:01 warmwaffles