DataFrames.jl
DataFrames.jl copied to clipboard
check if DataFrames.jl is affected by Threads.@spawn bug
x-ref https://github.com/JuliaLang/julia/issues/40626, https://github.com/JuliaData/CSV.jl/pull/1046, https://github.com/JuliaData/CSV.jl/issues/1045
I have checked and DataFrames.jl is affected by this issue.
The question is if running this function after every @sync
of spawned tasks:
function _clear_thread_state()
Threads.@threads :static for _ in 1:Threads.nthreads()
nothing
end
end
is good enough to ensure clearing thread state and will not cause problems in case some other tasks are spawned independently what DataFrames.jl does.
CC @tkf @nalimilan @quinnj
PR to CSV.jl with an alternative approach that I think may be more all-around correct: https://github.com/JuliaData/CSV.jl/pull/1058 (it links to 2 other issues in JuliaLang/julia & CSV.jl where I discuss recent findings/thoughts in more detail).
After https://github.com/JuliaServices/WorkerUtilities.jl/pull/12 discussion is finalized this issue should be resolved.
We need to ensure that both passed parameters and returned values in Task
can be eventually GC'ed.
If it is not possible to GC returned values we need to redesign DataFrames.jl to avoid returning values from Task
.
@quinnj (maybe @tkf or @vchuravy):
would just setting the result
field in Task
manually after it is processed be enough for the second problem?
What I mean is if after the line https://github.com/JuliaData/DataFrames.jl/blob/1b9d0374dbe5669c17a8342216de64781b14ca20/src/groupeddataframe/splitapplycombine.jl#L760
we would add
t.result = nothing
would it solve the issue of return value?
Yes, that would avoid holding references to the return value, as far as I understand.
Yes that would suffice, if you can guarantee that only one user will call fetch
alternativly you could use a 1-element channel. Which would have the additional benefit of being potentially type-stable.