Logical Operators slower?
I was expecting that LazyArrays would outperform a generator when we use logical operators, but I see it's not. Is this expected? I include a MWE, as the page says that if we find slower code, it'd be helpful to indicate it.
This is faster
using Random, BenchmarkTools; Random.seed!(1234)
x = rand(100)
function foo(x)
condition1(a) = a > 0.25
condition2(a) = a < 0.75
all_conditions = Iterators.map(a -> condition1(a) && condition2(a) , x)
sum(all_conditions)
end
@btime foo($x)
relative to
using Random, BenchmarkTools; Random.seed!(1234)
x = rand(100)
function foo(x)
condition1(a) = a > 0.25
condition2(a) = a < 0.75
all_conditions(a) = condition1(a) && condition2(a)
sum(@~ all_conditions.(x))
end
@btime foo($x)
You should include the outputs so we can see what the difference in timings are.
Why do you expect it to "outperform"?
Let me state the problem more generally. I'm writing a note about the benefits of using LazyArrays, but I can't identify when it's better than base generators.
I consistently find that LazyArrays performs the use of base generators (directly used or through Iterators.maps). But I was surprised that for some specific cases this does not happen. For instance, the first comparison is what I always get, which means that LazyArrays is way faster.
#Iterators.map
using Random; Random.seed!(123) #setting the seed for reproducibility #hide
x = rand(1_000_000) ; y = rand(1_000_000)
function foo(x,y)
lx(a) = 4 * a^3 + 3 * a^2 + 2 * a + 1
ly(b) = 2 * b^3 + 3 * b^2 + 4 * b + 1
temp(a,b) = lx(a) / ly(b)
sum(Iterators.map(temp, x,y))
end
@btime foo(ref($x),ref($y))
#LazyArrays
using Random; Random.seed!(123) #setting the seed for reproducibility #hide
x = rand(100_000) ; y = rand(100_000)
function foo(x,y)
lx(a) = 4 * a^3 + 3 * a^2 + 2 * a + 1
ly(b) = 2 * b^3 + 3 * b^2 + 4 * b + 1
temp(a,b) = lx(a) / ly(b)
sum(@~ temp.(x,y))
end
@btime foo(ref($x),ref($y))
############### RESULTS ##############
# x comprising 100 elements
# Iterators.maps: 123.214 ns (0 allocations: 0 bytes)
# @~ : 44.995 ns (0 allocations: 0 bytes)
# x comprising 100_000 elements
# Iterators.maps: 122.000 μs (0 allocations: 0 bytes)
# @~ : 39.200 μs (0 allocations: 0 bytes)
# x comprising 1_000_000 elements
# Iterators.maps: 1.232 ms (0 allocations: 0 bytes)
# @~ : 39.200 μs (0 allocations: 0 bytes)
Another case is for weighted means, where foo3 even outperforms the implementation in StatsBase
temp(x,share) = x * share
foo1(x,y) = sum(temp.(x,y))
foo2(x,y) = sum(temp(a,b) for (a,b) in zip(x,y))
foo3(x,y) = sum(@~ temp.(x,y))
In contrast, it seems that LazyArrays is less performant. The difference is not that big, but I was expecting given the previous results that there'd be even improvements.
#Iterators.map
using Random; Random.seed!(123) #setting the seed for reproducibility #hide
x = rand(1_000_000)
function foo(x)
condition1(a) = a > 0.25
condition2(a) = a < 0.75
all_conditions = Iterators.map(a -> condition1(a) && condition2(a) , x)
sum(all_conditions)
end
@btime foo(ref($x))
#LazyArrays
using Random; Random.seed!(123) #setting the seed for reproducibility #hide
x = rand(1_000_000)
function foo(x)
condition1(a) = a > 0.25
condition2(a) = a < 0.75
all_conditions(a) = condition1(a) && condition2(a)
sum(@~ all_conditions.(x))
end
@btime foo(ref($x))
############### RESULTS ##############
# x comprising 100 elements
# @~ : 14.529 ns (0 allocations: 0 bytes)
# Iterators.maps: 8.909 ns (0 allocations: 0 bytes)
# x comprising 100_000 elements
# @~ : 8.100 μs (0 allocations: 0 bytes)
# Iterators.maps: 7.050 μs (0 allocations: 0 bytes)
# x comprising 1_000_000 elements
# @~ : 106.200 μs (0 allocations: 0 bytes)
# Iterators.maps: 97.100 μs (0 allocations: 0 bytes)
So, my question is: just from a practical point of view and for scenarios not involving matrices (just vectors), is there any general recommendation about when LazyArrays is ideal? I'd like to write a note with a few rules of thumb, so that the readers can have a rough idea.
Many thanks for your time!!!
I believe the relevant code is here:
https://github.com/JuliaArrays/LazyArrays.jl/blob/3ecdcd03d2ee23622c711e5362ae503a166f3a5e/src/lazybroadcasting.jl#L108
It might have to do with the use of @simd, that is, without conditionals it is SIMD-able.
My only thought is that when its not SIMD-able, perhaps the forloop is doing bounds checking?
Note I didn't write this code, @mcabbott did, so perhaps he has some insight.
I have no memory of this, but making lazy things fast is tricky. Some chance this code is missing a Broadcast.instantiate? Some chance the loop should just be replaced with sum(bc) as Base has got better at handling un-materialised Broadcasted objects.
My general expectation would be that Iterators.map and LazyArrays should have the same performance, in simple uses like iteration, and that any deviations from this are more or less bugs, to be fixed on one side or the other. The point of LazyArrays is more that they are array-like, and can be indexed or preserve a shape for broadcasting, which Base's iterators will not do.
This is great to know. I have a better idea now! Thank you very much!