LazyArrays.jl icon indicating copy to clipboard operation
LazyArrays.jl copied to clipboard

Speed of sum(BroadcastArray)

Open mcabbott opened this issue 6 years ago • 12 comments
trafficstars

Here's something I was trying to do recently, to avoid materializing a huge array just to sum it:

V = rand(500);
V3 = reshape(V, 1,1,:);
@time sum(V .* V' .* V3)                                 # 0.57 seconds, 953.675 MiB
@time sum(BroadcastArray(*, V, V', V3))                  # 0.36 seconds
@time sum(BroadcastArray(*, V, V', V3), dims=(1,2,3))[1] # 0.014 seconds

Might it be worth making sum() simply call the dims=... version here?

Before I saw this package I was messing around with broadcasting to do this by hand, and could get in the ballpark of 0.014s here, but no better. Doing that seems likely to be more fragile.

mcabbott avatar Jan 13 '19 22:01 mcabbott

Happy to accept a PR fixing this!

dlfivefifty avatar Jan 14 '19 15:01 dlfivefifty

Will do.

But first, one more puzzle which just bit me: why is the first of these 5x quicker?

V2 = reshape(V, 1,:)
@time sum(BroadcastArray(*, V, V', V3), dims=(2,3));    # 0.025 seconds
@time sum(BroadcastArray(*, V, V2, V3), dims=(2,3));    # 0.132 seconds

Same times here:

W = similar(V);
@time sum!(W, BroadcastArray(*, V, V', V3));
@time sum!(W, BroadcastArray(*, V, V2, V3));

mcabbott avatar Jan 18 '19 18:01 mcabbott

I looked at the profile and the second one spends more time in getindex. I can't seem to think of a good reason why.

dlfivefifty avatar Jan 19 '19 08:01 dlfivefifty

Thanks for taking a look, it is indeed strange. The biggest difference in an isolated getindex I can see is... far from a factor of 5:

B1 = BroadcastArray(*, V, V', V3).broadcasted
B2 = BroadcastArray(*, V, V2, V3).broadcasted
@btime $B1[3,3,3]    # 3.816 ns
@btime $B2[3,3,3]    # 4.205 ns

mcabbott avatar Jan 19 '19 16:01 mcabbott

Remember order of access of data makes a big difference, but I’m not sure how to test that

dlfivefifty avatar Jan 19 '19 16:01 dlfivefifty

Yes that could be what's happening. I was trying to dig into what CartesianIndices etc. give, but couldn't find any difference. Here's another attempt, but I don't know what this means...

V = collect(11:12)/1;

W = collect(21:23)/1;
V2 = reshape(collect(21:23)/1, 1,:);

V3 = reshape(collect(31:34)/1, 1,1,:);

function Base.getindex(A::AbstractArray, I...) 
    Base.error_if_canonical_getindex(IndexStyle(A), A, I...)
    out = Base._getindex(IndexStyle(A), A, to_indices(A, I)...)
    println("  ",out,"  ",I)
    out
end

using LazyArrays

sum(BroadcastArray(*, V, W', V3), dims=(2,3));

sum(BroadcastArray(*, V, V2, V3), dims=(2,3));

The fast one ends with this, the slow one is the same but skips the lines starting 23.0:

  23.0  (CartesianIndex(1, 3),)
  8602.0  (1, CartesianIndex(3, 4))
  23.0  (CartesianIndex(1, 3),)
  9384.0  (2, CartesianIndex(3, 4))

mcabbott avatar Jan 19 '19 18:01 mcabbott

Ohhhhh, the answer is simple! V and V’ share memory, while reshape(V,1,:) allocates new memory. The compiler might be smart enough to only load memory, or if not it’s because it doesn’t have to jump to a new location.

dlfivefifty avatar Jan 19 '19 18:01 dlfivefifty

But if I set V2[2] = 99 then V is altered. However typeof(V2) and dump(V2) show no evidence of this, so perhaps at some level it's ignorant of the connection, and thus reads it twice? (But 5x?)

I also just tried making all three factors different random numbers, but one adjointed, one reshaped. And the speed difference is the same.

mcabbott avatar Jan 19 '19 19:01 mcabbott

Hmm, you are right. Here's another data point:

V4 = Base.ReshapedArray(V, (1, 500), ())
@time sum(BroadcastArray(*, V, V4, V3), dims=(2,3));    # 0.132 seconds

dlfivefifty avatar Jan 19 '19 21:01 dlfivefifty

BTW, I just saw https://github.com/JuliaLang/julia/pull/30973 and wonder if that issue (28126) might be what is going on here. Haven't tried it out yet.

mcabbott avatar Feb 06 '19 20:02 mcabbott

Can we close this?

dlfivefifty avatar Feb 08 '19 10:02 dlfivefifty

I remain puzzled by the V2 = reshape(V, 1,:) issue, although it's not obviously this package's fault.

mcabbott avatar Feb 08 '19 11:02 mcabbott