`asyncmap`: silently incorrect results with StructArrays (at least)
MWE:
julia> using StructArrays
# asyncmap() with sleep() inside returns nothings:
julia> asyncmap(StructArray(a=[1,2,3])) do x
sleep(1)
"abc $x"
end
3-element Vector{Nothing}:
nothing
nothing
nothing
# fine without sleep():
julia> asyncmap(StructArray(a=[1,2,3])) do x
"abc $x"
end
3-element Vector{String}:
"abc (a = 1,)"
"abc (a = 2,)"
"abc (a = 3,)"
# fine with map() instead of asyncmap():
julia> map(StructArray(a=[1,2,3])) do x
sleep(1)
"abc $x"
end
3-element Vector{String}:
"abc (a = 1,)"
"abc (a = 2,)"
"abc (a = 3,)"
The bug seems to be in asyncmap, it assumes too much about the collection. asyncmap works like this:
- create array of
Refs in https://github.com/JuliaLang/julia/blob/95f30e51f4158dd3421cd5b8098849d24e97e497/base/asyncmap.jl#L155 - loop over these
Refs (in multiple tasks, but the multi-task aspect doesn't matter) - mutate
ref.xto store the resulting element https://github.com/JuliaLang/julia/blob/95f30e51f4158dd3421cd5b8098849d24e97e497/base/asyncmap.jl#L94
In the MWE above with StructArrays, the asyncrun variable is naturally also a StructArray (it results from map). But extracting an element from a StructArray creates a new object, not a copy – no way around it. So, the r.x = ... assignment doesn't mutate anything in the resulting array that gets returned.
I guess, to be generic, the Channel should store indices in the original array instead of elements?
Actually, my explanation above doesn't explain why it works without sleep. That I don't know...
Could someone apply the "correctness bug" label please? Seems to fit perfectly here.
the bug being that map assumes that mutating the result of getindex at some location will mutate the value in the collection?
that sounds ... reasonable. maybe map(::StructArray) should return something that adheres to this?
given this design:
But extracting an element from a StructArray creates a new object, not a copy
I'm not entirely sure I would call a StructArray as a "collection," although it behaves in many respects like one
the bug being that map assumes that mutating the result of getindex at some location will mutate the value in the collection?
map is totally fine! it's asyncmap that is silently wrong
maybe map(::StructArray) should return something that adheres to this?
like... what?
I'm not entirely sure I would call a StructArray as a "collection,"
It perfectly fits the AbstractArray interface. StructArrays is the most popular package providing a "struct-of-arrays" object that keeps the same interface as a regular array.
adienes added [multithreading]
multithreading doesn't seem to be related
well, to be honest I'm not sure that I'd say the AbstractArray interface is defined precisely and explicitly enough such that it's even possible to adhere to it "perfectly"
I guess the question here is does getindex have to actually get you an element in the array or may it be a mirage, like in StructArrays. that is, must it hold that objectid(getindex(collection, index)) == objectid(getindex(collection, index)) for two separate calls to getindex.
I guess status quo is that the answer is "no," given that this isn't explicitly specified in the docs, but IMO it's a pretty intuitive assumption and I wouldn't be surprised if there were quite a lot of generic code relying on this.
to clarify, I definitely agree there is a bug somewhere, and I am exploring your suggestion to push indices into the Channel rather than refs. but it's just not 100% obvious to me that this is unequivocally an asyncmap correctness bug.
must it hold that objectid(getindex(collection, index)) == objectid(getindex(collection, index)) for two separate calls to getindex
So many practical and existing array types would be excluded by this! Not just StructArrays and other SoA-type packages, but also stuff like lazy arrays.
it's just not 100% obvious to me that this is unequivocally an asyncmap correctness bug
The other two parts in the code is StructArrays and end-user code. I don't see how any of them could behave differently in this situation, really...
I wouldn't be surprised if there were quite a lot of generic code relying on this
I'm a relatively heavy user of StructArrays, and never experienced silent correctness bugs like this. Sure, some functions just fail, like if there's a ::Vector constraint somewhere – but that's loud and clear.
Threaded maps like OhMyThreads.tmap just work: take a structarray and return a correct structarray back. Pretty sure they didn't specifically design their code with structarrays in mind :) Async map shouldn't be harder that threaded, right?
I just spot-checked LazyArrays.BroadcastArray and MappedArrays.MappedArray (not pretending to be an exhaustive review) and you are right it seems neither of these types satisfy arr[idx] === arr[idx].
although also neither of them experience this same asyncmap bug since map returns a normal Array for these.