AxisKeys.jl icon indicating copy to clipboard operation
AxisKeys.jl copied to clipboard

Poor performance of `getkey` compared to docs benchmarks

Open SBuercklin opened this issue 4 years ago • 2 comments

I have rerun the first set of benchmarks from /docs/speed.jl comparing getindex to getkey. In many instances getkey performs worse (both relative to getindex and relative to the values in speed.jl comments). For getkey in many instances I saw substantially worse performance (> 1 order of magnitude) compared to the getindex calls on my machine, including some allocations where there seemed to be none for the results in the comments.

I have tested this on both 1.4.2 and 1.6.1. 1.6.1 gave me ~2x speedup over the 1.4.2 results, but the numbers were still far off of what's present in the comments for getkey. The environment I ran this in was just the AxisKeys project environment with the latest version of all dependencies + BenchmarkTools. Machine information is included at the end.

Here are my results of running the benchmarks. I included the original values, and then appended my results from @btime afterwards in parentheses.

using AxisKeys, BenchmarkTools # Julia 1.4.2

#==============================#
#===== getkey vs getindex =====#
#==============================#

mat = wrapdims(rand(3,4), 11:13, 21:24)
bothmat = wrapdims(mat.data, x=11:13, y=21:24)
bothmat2 = wrapdims(mat.data, x=collect(11:13), y=collect(21:24))

@btime $mat[3, 4]    # 1.699 ns (1.699 ns)
@btime $mat(13, 24)  # 5.312 ns (124.232 ns (1 allocation: 32 bytes))

@btime $bothmat[3,4]        # 1.700 ns (1.649 ns)
@btime $bothmat[x=3, y=4]   # 1.701 ns (1.965 ns)
@btime $bothmat(13, 24)     # 5.874 ns  (124.299 ns (1 allocation: 32 bytes))
@btime $bothmat(x=13, y=24) # 14.063 ns (132.624 ns (1 allocation: 32 bytes))
@btime $bothmat2(13, 24)    # 16.719 ns (13.606 ns)

ind_collect(A) = [@inbounds(A[ijk...]) for ijk in Iterators.ProductIterator(axes(A))]
key_collect(A) = [@inbounds(A(vals...)) for vals in Iterators.ProductIterator(axiskeys(A))]

bigmat = wrapdims(rand(100,100), 1:100, 1:100);
bigmat2 = wrapdims(rand(100,100), collect(1:100), collect(1:100));

@btime ind_collect($(bigmat.data)); #  9.117 μs (4 allocations: 78.25 KiB) (12.646 μs (4 allocations: 78.25 KiB))
@btime ind_collect($bigmat);        # 11.530 μs (4 allocations: 78.25 KiB) (9.996 μs (4 allocations: 78.25 KiB))
@btime key_collect($bigmat);        # 64.064 μs (4 allocations: 78.27 KiB) (1.248 ms (10004 allocations: 390.77 KiB))
@btime key_collect($bigmat2);      # 718.804 μs (5 allocations: 78.27 KiB) (625.003 μs (5 allocations: 78.27 KiB))

twomat = wrapdims(mat.data, x=[:a, :b, :c], y=21:24)
@btime $twomat(x=:a, y=24)  # 36.734 ns (2 allocations: 64 bytes) (137.505 ns (2 allocations: 64 bytes))

@btime $twomat(24.0)        # 26.686 ns (4 allocations: 112 bytes) (44.374 ns (5 allocations: 144 bytes))
@btime $twomat(y=24.0)      # 33.860 ns (4 allocations: 112 bytes) (49.411 ns (6 allocations: 160 bytes))
@btime view($twomat, :,3)   # 24.951 ns (4 allocations: 112 bytes) (22.302 ns (4 allocations: 112 bytes))

Here's the version of everything from my instantiation:

(AxisKeys) pkg> st                                                                                                                     
Project AxisKeys v0.1.16
Status `~/Documents/AxisKeys.jl/Project.toml`
  [621f4979] AbstractFFTs v1.0.1
  [587fd27a] CovarianceEstimation v0.2.6
  [8197267c] IntervalSets v0.5.3
  [41ab1584] InvertedIndices v1.0.0
  [1fad7336] LazyStack v0.0.7
  [356022a1] NamedDims v0.2.29
  [6fe1bfb0] OffsetArrays v1.10.0
  [2913bbd2] StatsBase v0.33.8
  [bd369af6] Tables v1.4.3
  [37e2e46d] LinearAlgebra 
  [10745b16] Statistics 

I'm on a Thinkpad X1 Carbon 7th gen running Ubuntu 20.04. Here's the CPU/RAM info from lshw:

H/W path             Device          Class          Description
===============================================================
                                     system         20QDS3DQ00 (LENOVO_MT_20QD_BU_Think_FM_ThinkPad X1 Carbon
/0                                   bus            20QDS3DQ00
/0/2                                 memory         16GiB System Memory
/0/2/0                               memory         8GiB Row of chips LPDDR3 Synchronous 2133 MHz (0.5 ns)
/0/2/1                               memory         8GiB Row of chips LPDDR3 Synchronous 2133 MHz (0.5 ns)
/0/c                                 memory         256KiB L1 cache
/0/d                                 memory         1MiB L2 cache
/0/e                                 memory         8MiB L3 cache
/0/f                                 processor      Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
/0/11                                memory         128KiB BIOS
/0/100                               bridge         Coffee Lake HOST and DRAM Controller
/0/100/2                             display        UHD Graphics 620 (Whiskey Lake)
/0/100/4                             generic        Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal
/0/100/8                             generic        Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Pr

SBuercklin avatar Jun 17 '21 17:06 SBuercklin

That's not good! Thanks for the careful investigation.

It looks like this is the key difference -- the lookup from a Vector is similar, but the lookup from a range 1:100 (without collect) has become very slow. I get something similar, locally, too:

@btime key_collect($bigmat);        # 64.064 μs (4 allocations: 78.27 KiB) (1.248 ms (10004 allocations: 390.77 KiB))
@btime key_collect($bigmat2);      # 718.804 μs (5 allocations: 78.27 KiB) (625.003 μs (5 allocations: 78.27 KiB))

mcabbott avatar Jun 17 '21 17:06 mcabbott

I rolled back to 0.1.0 and tested that as well

@btime key_collect($bigmat);        # 64.064 μs (4 allocations: 78.27 KiB) (1.264 ms (10004 allocations: 390.77 KiB))           

So it seems this is an issue with some dependency rather than an AxisKeys issue :)

I'll make sure we avoid UnitRange indices for the time being, I don't think that'll cause any problems for us. Thanks!

SBuercklin avatar Jun 17 '21 18:06 SBuercklin