StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

Improve performance of unweighted `ecdf`

Open devmotion opened this issue 2 months ago • 1 comments

Fixes #964.

On the master branch:

julia> using StatsBase, BenchmarkTools

julia> x = randn(10_000_000);

julia> @benchmark ecdf($x)
BenchmarkTools.Trial: 7 samples with 1 evaluation per sample.
 Range (min … max):  700.318 ms … 913.679 ms  ┊ GC (min … max): 0.21% … 1.69%
 Time  (median):     714.015 ms               ┊ GC (median):    0.21%
 Time  (mean ± σ):   750.870 ms ±  79.523 ms  ┊ GC (mean ± σ):  2.37% ± 4.16%

  █
  █▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  700 ms           Histogram: frequency by time          914 ms <

 Memory estimate: 265.67 MiB, allocs estimate: 14.

With this PR:

julia> using StatsBase, BenchmarkTools

julia> x = randn(10_000_000);

julia> @benchmark ecdf($x)
BenchmarkTools.Trial: 34 samples with 1 evaluation per sample.
 Range (min … max):  129.678 ms … 276.215 ms  ┊ GC (min … max): 0.35% … 51.29%
 Time  (median):     131.480 ms               ┊ GC (median):    0.69%
 Time  (mean ± σ):   147.635 ms ±  38.302 ms  ┊ GC (mean ± σ):  9.64% ± 14.83%

  █
  ██▅▁▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  130 ms        Histogram: log(frequency) by time        276 ms <

 Memory estimate: 189.36 MiB, allocs estimate: 16.

devmotion avatar Oct 10 '25 19:10 devmotion

Failing tests are fixed by #966 and #967.

devmotion avatar Oct 14 '25 11:10 devmotion