StatsBase.jl
StatsBase.jl copied to clipboard
Improve performance of unweighted `ecdf`
Fixes #964.
On the master branch:
julia> using StatsBase, BenchmarkTools
julia> x = randn(10_000_000);
julia> @benchmark ecdf($x)
BenchmarkTools.Trial: 7 samples with 1 evaluation per sample.
Range (min … max): 700.318 ms … 913.679 ms ┊ GC (min … max): 0.21% … 1.69%
Time (median): 714.015 ms ┊ GC (median): 0.21%
Time (mean ± σ): 750.870 ms ± 79.523 ms ┊ GC (mean ± σ): 2.37% ± 4.16%
█
█▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
700 ms Histogram: frequency by time 914 ms <
Memory estimate: 265.67 MiB, allocs estimate: 14.
With this PR:
julia> using StatsBase, BenchmarkTools
julia> x = randn(10_000_000);
julia> @benchmark ecdf($x)
BenchmarkTools.Trial: 34 samples with 1 evaluation per sample.
Range (min … max): 129.678 ms … 276.215 ms ┊ GC (min … max): 0.35% … 51.29%
Time (median): 131.480 ms ┊ GC (median): 0.69%
Time (mean ± σ): 147.635 ms ± 38.302 ms ┊ GC (mean ± σ): 9.64% ± 14.83%
█
██▅▁▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
130 ms Histogram: log(frequency) by time 276 ms <
Memory estimate: 189.36 MiB, allocs estimate: 16.
Failing tests are fixed by #966 and #967.