loo icon indicating copy to clipboard operation
loo copied to clipboard

Performance improvement

Open ParadaCarleton opened this issue 4 years ago • 12 comments

Only sort the tail values of the array, rather than the whole thing.

ParadaCarleton avatar Aug 27 '21 02:08 ParadaCarleton

Is this keeping the largest value smaller than tail values in a correct position?

cutoff <- ord$x[min(tail_ids) - 1] # largest value smaller than tail values

avehtari avatar Sep 20 '21 13:09 avehtari

Hopefully someday my brain will stop making off-by-one errors, but today is not that day.

ParadaCarleton avatar Sep 20 '21 16:09 ParadaCarleton

Maybe I need to rephrase. After your change, how do you determine the largest value that is smaller than the tail values? If you don't sort them, I think you should at least find the largest value among the non-tail values, e.g. with max() function.

avehtari avatar Sep 20 '21 18:09 avehtari

Maybe I need to rephrase. After your change, how do you determine the largest value that is smaller than the tail values? If you don't sort them, I think you should at least find the largest value among the non-tail values, e.g. with max() function.

Doesn't the code determine that by sorting the n+1 largest values into position, where n is the tail length? I might be misunderstanding what R's partial sort does.

ParadaCarleton avatar Sep 20 '21 18:09 ParadaCarleton

Ah, I was too tired when checking this and got confused. I'm also confused as off-by-one commit didn't make any change to the code, only change is in json. So how did you fix that off-by-one error?

avehtari avatar Sep 20 '21 19:09 avehtari

Ah, I was too tired when checking this and got confused. I'm also confused as off-by-one commit didn't make any change to the code, only change is in json. So how did you fix that off-by-one error?

Whoops, pushed the wrong change -- this should fix the off-by-one error. (I was accidentally only sorting the tail, rather than sorting from the last element not in the tail.)

ParadaCarleton avatar Sep 20 '21 21:09 ParadaCarleton

Thanks @ParadaCarleton and sorry for not getting to this sooner! I just triggered all the continuous integration tests to run again so let's see if they pass.

And yeah, @avehtari I think the latest commit from Carlos fixes the off by one issue, although if you want to double check that would be great. The tests should also catch it since I think we have regression tests that detect any changes to the loo results (and presumably an off by one error would lead to some change in the results).

jgabry avatar Sep 20 '21 21:09 jgabry

Actually some tests seem to be failing, so we'll need to look into that: https://github.com/stan-dev/loo/pull/185/checks?check_run_id=3656664234#step:11:150

jgabry avatar Sep 20 '21 21:09 jgabry

Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.

julia> using BenchmarkTools

julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));

julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;

julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;

julia> logw = randn(1000);

julia> foo(logw) == bar(logw)
true

julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)

julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)

sethaxen avatar Dec 11 '21 01:12 sethaxen

Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.

julia> using BenchmarkTools

julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));

julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;

julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;

julia> logw = randn(1000);

julia> foo(logw) == bar(logw)
true

julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)

julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)

I benchmarked this and got a major speedup in Julia when I switched to only sorting partially; I assume the difference is to do with Julia's sorting algorithms, which have some large problems with cache misses. Feel free to discuss this with me on Slack.

ParadaCarleton avatar Dec 11 '21 03:12 ParadaCarleton

Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.

julia> using BenchmarkTools

julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));

julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;

julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;

julia> logw = randn(1000);

julia> foo(logw) == bar(logw)
true

julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)

julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)

You're sorting from the tail length to the end of the array (accidentally sorting everything but the tail). The result is you're asking for almost the whole array to be sorted (~900 instead of ~100 elements), so the constant overhead of partialsort exceeds partialsort's better asymptotic performance. The cost is O(n+k log(k)) to sort k elements from a list of n -- because k=O(sqrt(n)), the cost is linear in the number of samples instead of O(n*log(n)). (The performance is still better for small samples, though, because k is at most 20% of the sample.)

julia> @benchmark partialsort($x, (1000-len):1000)
BenchmarkTools.Trial: 
  memory estimate:  8.03 KiB
  allocs estimate:  4
  --------------
  minimum time:     2.609 μs (0.00% GC)
  median time:      2.741 μs (0.00% GC)
  mean time:        3.107 μs (6.50% GC)
  maximum time:     177.498 μs (89.91% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark sort($x)
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     12.518 μs (0.00% GC)
  median time:      13.624 μs (0.00% GC)
  mean time:        14.270 μs (0.00% GC)
  maximum time:     213.956 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

ParadaCarleton avatar Dec 23 '21 20:12 ParadaCarleton

You're sorting from the tail length to the end of the array (accidentally sorting everything but the tail).

Good catch! Yes, I'm seeing the same performance improvement you're seeing now.

sethaxen avatar Dec 29 '21 15:12 sethaxen