Should indexing return a LazyRow by default?
It seems like a gotcha that indexing into a StructArray will materialize a new struct.
Maybe it should return a LazyRow by default?
If all the fields are isbits then materializing a new struct does not allocate and is a reasonable thing to do. If your StructArray is large, OTOH, you should probably iterate using LazyRows(s::StructArray).
I'd rather not change the default, as many users use StructArray for small, isbits structs, like ComplexF64 and expect the correct type being returned, but I agree that the performance characteristics in the non-isbits (or "wide") case should be better documented. I'm keeping this issue open as a reminder to add docs about this.
Thanks. My use case is not iterating but indexing into given elements. The (native) code generated for that (using LazyRow) seems to be surprisingly complicated, so I'm worried about performance.
Benchmarking on a cheap operation (like sum) does not reveal a difference:
julia> using StructArrays, BenchmarkTools
julia> s = StructArray(a = rand(1000), b = rand(1000));
julia> @btime sum(LazyRow($s, i).a for i in 1:1000);
1.038 μs (0 allocations: 0 bytes)
julia> @btime sum($s.a[i] for i in 1:1000);
1.037 μs (0 allocations: 0 bytes)
Regarding the native code complexity, it may be a matter of adding @inbounds. If one uses getproperty within @inbounds the native code seems fine, compare:
julia> s = StructArray(a = rand(10), b = rand(10));
julia> l = LazyRow(s, 1);
julia> f(l) = (ret = @inbounds l.a; ret)
f (generic function with 1 method)
julia> @code_native f(l)
.text
; ┌ @ REPL[30]:1 within `f'
; │┌ @ REPL[30]:1 within `getproperty'
movq (%rdi), %rax
; │└
; │┌ @ lazy.jl:9 within `getproperty'
; ││┌ @ array.jl:744 within `getindex'
movq 8(%rdi), %rcx
; ││└
; ││ @ lazy.jl:9 within `getproperty' @ structarray.jl:126
; ││┌ @ structarray.jl:124 within `fieldarrays'
movq (%rax), %rax
; ││└
movq (%rax), %rax
; │└
; │┌ @ array.jl:744 within `getproperty'
movq (%rax), %rax
vmovsd -8(%rax,%rcx,8), %xmm0 # xmm0 = mem[0],zero
; │└
retq
nopw (%rax,%rax)
; └
with the hand-written version
julia> g(s) = (@inbounds ret = s.a[1]; ret)
g (generic function with 1 method)
julia> @code_native g(s)
.text
; ┌ @ REPL[33]:1 within `g'
; │┌ @ structarray.jl:126 within `getproperty'
; ││┌ @ REPL[33]:1 within `fieldarrays'
movq (%rdi), %rax
; ││└
movq (%rax), %rax
; │└
; │┌ @ array.jl:744 within `getindex'
movq (%rax), %rax
vmovsd (%rax), %xmm0 # xmm0 = mem[0],zero
; │└
retq
nop
; └
This way of using @inbounds may be counter-intuitive, maybe I should do the @boundscheck when generating the LazyRow, not when accessing its fields.
That's nice, thanks. Yes, that is rather counterintuitive; it would be good to hide that inside the implementation.
@piever I came here to ask a similar question. In my case I am using StructArrays for a matrix of composite structs.
A problem is that I want to also segment the elements into Sets. It took me a while to realize why membership checks were always failing, and indeed why
S[i] == S[i] %false
for a given structarray S and index i—the materializations are different objects, while:
LazyRow(S, i) == LazyRow(S, i) %true.
Is it possible to perhaps parameterize what getindex will return? Getting the LazyRow() back would greatly simplify my code and allow consistent patterns with other data structures.