elixir icon indicating copy to clipboard operation
elixir copied to clipboard

do Enum.with_index/2 tail-recursively on lists

Open ckampfe opened this issue 1 year ago • 1 comments

Hi folks!

This PR modifies the implementation of Enum.with_index/2, specifically with regard to lists.

The tl;dr is that there is a fair bit of throughput to be gained on both large and small lists at the expense of some memory.

Is this a tradeoff the project wants to make? That's not for me to say, but I just wanted to share this work in case it was. I will say I'm going to be using Enum.with_index/2 on some fairly large data processing at work, so I do have a personal bias toward throughput in this scenario.

The usual caveats apply here, as I ran all benchmarking on my development machine while trying to keep as few things running as possible to minimize noisy neighbors, but it is a development machine, not a dedicated host.

Benchmark code:

IO.inspect(Mix.env(), label: "Mix env")

f = fn index, item ->
  {index, item + 100}
end

tiny = Enum.into(1..100, [])
small = Enum.into(1..1000, [])
medium = Enum.into(1..10_000, [])
bigger = Enum.into(1..100_000, [])
large = Enum.into(1..1_000_000, [])

Benchee.run(
  %{
    "Enum.with_index/2 with fn" => fn input -> Enum.with_index(input, f) end,
    "Enum.with_index/2 with offset" => fn input -> Enum.with_index(input, 8) end
  },
  inputs: %{
    "a. tiny - 100" => tiny,
    "b. small - 1,000" => small,
    "c. medium - 10,000" => medium,
    "d. bigger - 100,000" => bigger,
    "e. large - 1,000,000" => large
  },
  time: 6,
  memory_time: 2
)

To run the benchmarks I first built Elixir on main like make clean && make, then ran them with ~/code/elixir/bin/elixir ~/code/elixir/bin/mix clean && MIX_ENV=prod ~/code/elixir/bin/elixir ~/code/elixir/bin/mix run bench.exs, then changed branches to this branch, then rebuilt Elixir with make clean && make, and then reran the benchmarks with the same command as before.

Benchmark on main

$ ~/code/elixir/bin/elixir ~/code/elixir/bin/mix clean && MIX_ENV=prod ~/code/elixir/bin/elixir ~/code/elixir/bin/mix run bench.exs
Compiling 2 files (.ex)
Generated wi2 app
Mix env: :prod
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.18.0-dev
Erlang 27.1.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 6 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: a. tiny - 100, b. small - 1,000, c. medium - 10,000, d. bigger - 100,000, e. large - 1,000,000
Estimated total run time: 1 min 40 s

Benchmarking Enum.with_index/2 with fn with input a. tiny - 100 ...
Benchmarking Enum.with_index/2 with fn with input b. small - 1,000 ...
Benchmarking Enum.with_index/2 with fn with input c. medium - 10,000 ...
Benchmarking Enum.with_index/2 with fn with input d. bigger - 100,000 ...
Benchmarking Enum.with_index/2 with fn with input e. large - 1,000,000 ...
Benchmarking Enum.with_index/2 with offset with input a. tiny - 100 ...
Benchmarking Enum.with_index/2 with offset with input b. small - 1,000 ...
Benchmarking Enum.with_index/2 with offset with input c. medium - 10,000 ...
Benchmarking Enum.with_index/2 with offset with input d. bigger - 100,000 ...
Benchmarking Enum.with_index/2 with offset with input e. large - 1,000,000 ...
Calculating statistics...
Formatting results...

##### With input a. tiny - 100 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with fn            1.10 M        0.91 μs  ±1950.65%        0.83 μs        1.04 μs
Enum.with_index/2 with offset        0.98 M        1.02 μs  ±2040.57%        0.79 μs        1.13 μs

Comparison:
Enum.with_index/2 with fn            1.10 M
Enum.with_index/2 with offset        0.98 M - 1.13x slower +0.118 μs

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with fn             3.91 KB
Enum.with_index/2 with offset         3.91 KB - 1.00x memory usage +0 KB

**All measurements for memory usage were the same**

##### With input b. small - 1,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset      107.96 K        9.26 μs    ±97.51%        7.83 μs       35.46 μs
Enum.with_index/2 with fn          104.87 K        9.54 μs    ±75.92%        9.13 μs       13.50 μs

Comparison:
Enum.with_index/2 with offset      107.96 K
Enum.with_index/2 with fn          104.87 K - 1.03x slower +0.27 μs

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset        39.06 KB
Enum.with_index/2 with fn            39.06 KB - 1.00x memory usage +0 KB

**All measurements for memory usage were the same**

##### With input c. medium - 10,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset       10.44 K       95.82 μs     ±3.89%       96.17 μs      103.42 μs
Enum.with_index/2 with fn            7.65 K      130.64 μs    ±12.93%      128.17 μs      252.25 μs

Comparison:
Enum.with_index/2 with offset       10.44 K
Enum.with_index/2 with fn            7.65 K - 1.36x slower +34.82 μs

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset       390.63 KB
Enum.with_index/2 with fn           390.63 KB - 1.00x memory usage +0 KB

**All measurements for memory usage were the same**

##### With input d. bigger - 100,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset        868.12        1.15 ms     ±9.55%        1.15 ms        1.18 ms
Enum.with_index/2 with fn            585.40        1.71 ms     ±7.09%        1.70 ms        1.83 ms

Comparison:
Enum.with_index/2 with offset        868.12
Enum.with_index/2 with fn            585.40 - 1.48x slower +0.56 ms

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset         3.81 MB
Enum.with_index/2 with fn             3.81 MB - 1.00x memory usage +0.00005 MB

**All measurements for memory usage were the same**

##### With input e. large - 1,000,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset         40.75       24.54 ms    ±33.75%       22.54 ms       44.82 ms
Enum.with_index/2 with fn             28.37       35.24 ms    ±90.64%       29.13 ms      174.37 ms

Comparison:
Enum.with_index/2 with offset         40.75
Enum.with_index/2 with fn             28.37 - 1.44x slower +10.71 ms

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset        38.15 MB
Enum.with_index/2 with fn            38.15 MB - 1.00x memory usage -0.00060 MB

**All measurements for memory usage were the same**

Benchmark on this branch

$ ~/code/elixir/bin/elixir ~/code/elixir/bin/mix clean && MIX_ENV=prod ~/code/elixir/bin/elixir ~/code/elixir/bin/mix run bench.exs
Compiling 2 files (.ex)
Generated wi2 app
Mix env: :prod
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.18.0-dev
Erlang 27.1.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 6 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: a. tiny - 100, b. small - 1,000, c. medium - 10,000, d. bigger - 100,000, e. large - 1,000,000
Estimated total run time: 1 min 40 s

Benchmarking Enum.with_index/2 with fn with input a. tiny - 100 ...
Benchmarking Enum.with_index/2 with fn with input b. small - 1,000 ...
Benchmarking Enum.with_index/2 with fn with input c. medium - 10,000 ...
Benchmarking Enum.with_index/2 with fn with input d. bigger - 100,000 ...
Benchmarking Enum.with_index/2 with fn with input e. large - 1,000,000 ...
Benchmarking Enum.with_index/2 with offset with input a. tiny - 100 ...
Benchmarking Enum.with_index/2 with offset with input b. small - 1,000 ...
Benchmarking Enum.with_index/2 with offset with input c. medium - 10,000 ...
Benchmarking Enum.with_index/2 with offset with input d. bigger - 100,000 ...
Benchmarking Enum.with_index/2 with offset with input e. large - 1,000,000 ...
Calculating statistics...
Formatting results...

##### With input a. tiny - 100 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset        2.48 M      403.93 ns  ±5368.21%         333 ns         541 ns
Enum.with_index/2 with fn            1.69 M      591.08 ns  ±4174.85%         541 ns         709 ns

Comparison:
Enum.with_index/2 with offset        2.48 M
Enum.with_index/2 with fn            1.69 M - 1.46x slower +187.15 ns

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset         3.94 KB
Enum.with_index/2 with fn             5.47 KB - 1.39x memory usage +1.53 KB

**All measurements for memory usage were the same**

##### With input b. small - 1,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset      247.39 K        4.04 μs   ±186.39%        3.58 μs       13.88 μs
Enum.with_index/2 with fn          172.72 K        5.79 μs    ±89.57%        5.25 μs       17.17 μs

Comparison:
Enum.with_index/2 with offset      247.39 K
Enum.with_index/2 with fn          172.72 K - 1.43x slower +1.75 μs

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset        54.69 KB
Enum.with_index/2 with fn            54.69 KB - 1.00x memory usage +0 KB

**All measurements for memory usage were the same**

##### With input c. medium - 10,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with fn           16.55 K       60.43 μs     ±7.29%       60.08 μs       67.42 μs
Enum.with_index/2 with offset       15.75 K       63.49 μs    ±87.38%       42.71 μs      260.67 μs

Comparison:
Enum.with_index/2 with fn           16.55 K
Enum.with_index/2 with offset       15.75 K - 1.05x slower +3.06 μs

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with fn           430.33 KB
Enum.with_index/2 with offset       430.36 KB - 1.00x memory usage +0.0313 KB

**All measurements for memory usage were the same**

##### With input d. bigger - 100,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset        1.10 K        0.91 ms     ±4.67%        0.90 ms        0.99 ms
Enum.with_index/2 with fn            0.95 K        1.06 ms    ±14.70%        1.05 ms        1.27 ms

Comparison:
Enum.with_index/2 with offset        1.10 K
Enum.with_index/2 with fn            0.95 K - 1.17x slower +0.151 ms

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset         4.83 MB
Enum.with_index/2 with fn             4.92 MB - 1.02x memory usage +0.0931 MB

**All measurements for memory usage were the same**

##### With input e. large - 1,000,000 #####
Name                                    ips        average  deviation         median         99th %
Enum.with_index/2 with offset         56.54       17.69 ms    ±32.77%       18.57 ms       32.38 ms
Enum.with_index/2 with fn             42.61       23.47 ms    ±31.16%       24.80 ms       40.53 ms

Comparison:
Enum.with_index/2 with offset         56.54
Enum.with_index/2 with fn             42.61 - 1.33x slower +5.79 ms

Memory usage statistics:

Name                             Memory usage
Enum.with_index/2 with offset        51.49 MB
Enum.with_index/2 with fn            53.32 MB - 1.04x memory usage +1.82 MB

**All measurements for memory usage were the same**

ckampfe avatar Oct 24 '24 20:10 ckampfe

Thank you for the PR and the excellent work! The general guideline is that list-recursive vs tail-recursive is roughly the same (often trading CPU and memory, as you measured) and we should go with the code that is cleaner (which tends to be the one that does not pass the accumulator around). So I'd vote to keep it as is. But maybe @sabiwara has other thoughts.

josevalim avatar Oct 24 '24 20:10 josevalim

Here is the PR where we originally discussed this: https://github.com/elixir-lang/elixir/pull/11765

I think nobody had strong opinions back then, either seemed fine so we kept the simpler one.

sabiwara avatar Oct 24 '24 23:10 sabiwara

Wow, interesting. Regardless of your decision, that PR (and the links in it) is great context I had not seen before, thank you.

ckampfe avatar Oct 24 '24 23:10 ckampfe

Closing this then, thank you :)

josevalim avatar Oct 25 '24 05:10 josevalim