LazyArrays.jl Logical Operators slower?

I was expecting that LazyArrays would outperform a generator when we use logical operators, but I see it's not. Is this expected? I include a MWE, as the page says that if we find slower code, it'd be helpful to indicate it.

This is faster

using Random, BenchmarkTools; Random.seed!(1234)
x = rand(100)

function foo(x)
    condition1(a)     = a > 0.25
    condition2(a)     = a < 0.75    
    all_conditions    = Iterators.map(a ->  condition1(a) && condition2(a) , x)    

    sum(all_conditions)
end

@btime foo($x)

relative to

using Random, BenchmarkTools; Random.seed!(1234)
x = rand(100)

function foo(x)
    condition1(a)     = a > 0.25
    condition2(a)     = a < 0.75
    all_conditions(a) = condition1(a) && condition2(a)
    
    sum(@~ all_conditions.(x))
end

@btime foo($x)

Sep 30 '23 20:09 alfaromartino

You should include the outputs so we can see what the difference in timings are.

Why do you expect it to "outperform"?

Sep 30 '23 21:09 dlfivefifty

Let me state the problem more generally. I'm writing a note about the benefits of using LazyArrays, but I can't identify when it's better than base generators.

I consistently find that LazyArrays performs the use of base generators (directly used or through Iterators.maps). But I was surprised that for some specific cases this does not happen. For instance, the first comparison is what I always get, which means that LazyArrays is way faster.

#Iterators.map
using Random; Random.seed!(123)       #setting the seed for reproducibility #hide
x = rand(1_000_000) ; y = rand(1_000_000)

function foo(x,y) 
    lx(a)     = 4 * a^3 + 3 * a^2 + 2 * a + 1
    ly(b)     = 2 * b^3 + 3 * b^2 + 4 * b + 1
    temp(a,b) = lx(a) / ly(b)
    
    sum(Iterators.map(temp, x,y))
end

@btime foo(ref($x),ref($y))



#LazyArrays
using Random; Random.seed!(123)       #setting the seed for reproducibility #hide
x = rand(100_000) ; y = rand(100_000)

function foo(x,y) 
    lx(a)     = 4 * a^3 + 3 * a^2 + 2 * a + 1
    ly(b)     = 2 * b^3 + 3 * b^2 + 4 * b + 1
    temp(a,b) = lx(a) / ly(b)
    
    sum(@~ temp.(x,y))
end

@btime foo(ref($x),ref($y))


############### RESULTS ##############
# x comprising 100 elements
    # Iterators.maps: 123.214 ns (0 allocations: 0 bytes)
    # @~            : 44.995 ns (0 allocations: 0 bytes)

# x comprising 100_000 elements
    # Iterators.maps: 122.000 μs (0 allocations: 0 bytes)
    # @~            : 39.200 μs (0 allocations: 0 bytes)

# x comprising 1_000_000 elements
    # Iterators.maps: 1.232 ms (0 allocations: 0 bytes)
    # @~            : 39.200 μs (0 allocations: 0 bytes)

Another case is for weighted means, where foo3 even outperforms the implementation in StatsBase

temp(x,share) = x * share

foo1(x,y) = sum(temp.(x,y))
foo2(x,y) = sum(temp(a,b) for (a,b) in zip(x,y))
foo3(x,y) = sum(@~ temp.(x,y))

In contrast, it seems that LazyArrays is less performant. The difference is not that big, but I was expecting given the previous results that there'd be even improvements.

#Iterators.map
using Random; Random.seed!(123)       #setting the seed for reproducibility #hide
x = rand(1_000_000)

function foo(x)
    condition1(a)     = a > 0.25
    condition2(a)     = a < 0.75    
    all_conditions    = Iterators.map(a ->  condition1(a) && condition2(a) , x)    

    sum(all_conditions)
end

@btime foo(ref($x))


#LazyArrays
using Random; Random.seed!(123)       #setting the seed for reproducibility #hide
x = rand(1_000_000)

function foo(x)
    condition1(a)     = a > 0.25
    condition2(a)     = a < 0.75
    all_conditions(a) = condition1(a) && condition2(a)
    
    sum(@~ all_conditions.(x))
end

@btime foo(ref($x))


############### RESULTS ##############
# x comprising 100 elements
    # @~            : 14.529 ns (0 allocations: 0 bytes)
    # Iterators.maps: 8.909 ns (0 allocations: 0 bytes)

# x comprising 100_000 elements
    # @~            : 8.100 μs (0 allocations: 0 bytes)
    # Iterators.maps: 7.050 μs (0 allocations: 0 bytes)

# x comprising 1_000_000 elements
    # @~            : 106.200 μs (0 allocations: 0 bytes)
    # Iterators.maps: 97.100 μs (0 allocations: 0 bytes)

So, my question is: just from a practical point of view and for scenarios not involving matrices (just vectors), is there any general recommendation about when LazyArrays is ideal? I'd like to write a note with a few rules of thumb, so that the readers can have a rough idea.

Many thanks for your time!!!

Sep 30 '23 22:09 alfaromartino

I believe the relevant code is here:

https://github.com/JuliaArrays/LazyArrays.jl/blob/3ecdcd03d2ee23622c711e5362ae503a166f3a5e/src/lazybroadcasting.jl#L108

It might have to do with the use of @simd, that is, without conditionals it is SIMD-able.

My only thought is that when its not SIMD-able, perhaps the forloop is doing bounds checking?

Note I didn't write this code, @mcabbott did, so perhaps he has some insight.

Oct 01 '23 08:10 dlfivefifty

I have no memory of this, but making lazy things fast is tricky. Some chance this code is missing a Broadcast.instantiate? Some chance the loop should just be replaced with sum(bc) as Base has got better at handling un-materialised Broadcasted objects.

My general expectation would be that Iterators.map and LazyArrays should have the same performance, in simple uses like iteration, and that any deviations from this are more or less bugs, to be fixed on one side or the other. The point of LazyArrays is more that they are array-like, and can be indexed or preserve a shape for broadcasting, which Base's iterators will not do.

Oct 01 '23 12:10 mcabbott

This is great to know. I have a better idea now! Thank you very much!

Oct 01 '23 20:10 alfaromartino