loo
                                
                                
                                
                                    loo copied to clipboard
                            
                            
                            
                        Performance improvement
Only sort the tail values of the array, rather than the whole thing.
Is this keeping the largest value smaller than tail values in a correct position?
cutoff <- ord$x[min(tail_ids) - 1] # largest value smaller than tail values
                                    
                                    
                                    
                                
Hopefully someday my brain will stop making off-by-one errors, but today is not that day.
Maybe I need to rephrase. After your change, how do you determine the largest value that is smaller than the tail values? If you don't sort them, I think you should at least find the largest value among the non-tail values, e.g. with max() function.
Maybe I need to rephrase. After your change, how do you determine the largest value that is smaller than the tail values? If you don't sort them, I think you should at least find the largest value among the non-tail values, e.g. with max() function.
Doesn't the code determine that by sorting the n+1 largest values into position, where n is the tail length? I might be misunderstanding what R's partial sort does.
Ah, I was too tired when checking this and got confused. I'm also confused as off-by-one commit didn't make any change to the code, only change is in json. So how did you fix that off-by-one error?
Ah, I was too tired when checking this and got confused. I'm also confused as off-by-one commit didn't make any change to the code, only change is in json. So how did you fix that off-by-one error?
Whoops, pushed the wrong change -- this should fix the off-by-one error. (I was accidentally only sorting the tail, rather than sorting from the last element not in the tail.)
Thanks @ParadaCarleton and sorry for not getting to this sooner! I just triggered all the continuous integration tests to run again so let's see if they pass.
And yeah, @avehtari I think the latest commit from Carlos fixes the off by one issue, although if you want to double check that would be great. The tests should also catch it since I think we have regression tests that detect any changes to the loo results (and presumably an off by one error would lead to some change in the results).
Actually some tests seem to be failing, so we'll need to look into that: https://github.com/stan-dev/loo/pull/185/checks?check_run_id=3656664234#step:11:150
Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.
julia> using BenchmarkTools
julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));
julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;
julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;
julia> logw = randn(1000);
julia> foo(logw) == bar(logw)
true
julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)
julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)
                                    
                                    
                                    
                                
Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.
julia> using BenchmarkTools julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff))); julia> function foo(logw) S = length(logw) M = tail_length(1, S) return sort(logw)[M:S] end; julia> function bar(logw) S = length(logw) M = tail_length(1, S) return partialsort(logw, M:S) end; julia> logw = randn(1000); julia> foo(logw) == bar(logw) true julia> @btime $foo($logw); 18.822 μs (2 allocations: 15.19 KiB) julia> @btime $bar($logw); 21.504 μs (1 allocation: 7.94 KiB)
I benchmarked this and got a major speedup in Julia when I switched to only sorting partially; I assume the difference is to do with Julia's sorting algorithms, which have some large problems with cache misses. Feel free to discuss this with me on Slack.
Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.
julia> using BenchmarkTools julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff))); julia> function foo(logw) S = length(logw) M = tail_length(1, S) return sort(logw)[M:S] end; julia> function bar(logw) S = length(logw) M = tail_length(1, S) return partialsort(logw, M:S) end; julia> logw = randn(1000); julia> foo(logw) == bar(logw) true julia> @btime $foo($logw); 18.822 μs (2 allocations: 15.19 KiB) julia> @btime $bar($logw); 21.504 μs (1 allocation: 7.94 KiB)
You're sorting from the tail length to the end of the array (accidentally sorting everything but the tail). The result is you're asking for almost the whole array to be sorted (~900 instead of ~100 elements), so the constant overhead of partialsort exceeds partialsort's better asymptotic performance. The cost is O(n+k log(k)) to sort k elements from a list of n -- because k=O(sqrt(n)), the cost is linear in the number of samples instead of O(n*log(n)). (The performance is still better for small samples, though, because k is at most 20% of the sample.)
julia> @benchmark partialsort($x, (1000-len):1000)
BenchmarkTools.Trial: 
  memory estimate:  8.03 KiB
  allocs estimate:  4
  --------------
  minimum time:     2.609 μs (0.00% GC)
  median time:      2.741 μs (0.00% GC)
  mean time:        3.107 μs (6.50% GC)
  maximum time:     177.498 μs (89.91% GC)
  --------------
  samples:          10000
  evals/sample:     9
julia> @benchmark sort($x)
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     12.518 μs (0.00% GC)
  median time:      13.624 μs (0.00% GC)
  mean time:        14.270 μs (0.00% GC)
  maximum time:     213.956 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
                                    
                                    
                                    
                                
You're sorting from the tail length to the end of the array (accidentally sorting everything but the tail).
Good catch! Yes, I'm seeing the same performance improvement you're seeing now.