julia
julia copied to clipboard
Race Condition: `Test.@test_logs` is not thread safe.
EDIT: changed to a much simpler example that shows the issue.
The @test_logs
macro is not thread safe, if the code under test uses multiple threads:
julia> function foo(n)
@info "Doing foo with n=$n"
@sync for i=1:n
Threads.@spawn @info "Iteration $i"
end
42
end
foo (generic function with 1 method)
julia> for _ in 1:1000
@test_logs (:info,"Doing foo with n=10000") match_mode=:any foo(10_000)
end
julia+RAI(56155,0x1f5157ac0) malloc: double free for ptr 0x128248000
julia+RAI(56155,0x1f5157ac0) malloc: *** set a breakpoint in malloc_error_break to debug
[56155] signal (6): Abort trap: 6
in expression starting at REPL[8]:1
signal (6) thread (1) __pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 54370881 (Pool: 54363911; Big: 6970); GC: 119
[1] 56154 abort julia -tauto
This is definitely a bug, and should be fixed.
We see this showing up in our CI build farm from time to time.
The solution is almost certainly to make that vector thread-safe, by introducing a lock around it, and then copying the vector out before returning.
EDIT: I've edited the original message to the much more compelling example @test_logs
example.
Moving the previous description here, as a secondary example:
Any Task spawned during a Test, which logs after Test.collect_test_logs()
returns can trigger a data race, where the spawned task logs into the logs
vector, while the caller is reading from that vector.
Here is a contrived example that I managed to get to reproduce the issue:
julia> function code_to_test()
# do the work
@info "starting"
start_transaction()
# ...
@info "done"
# Clean up
end_transaction()
end
code_to_test (generic function with 1 method)
julia> start_transaction() = nothing
start_transaction (generic function with 1 method)
julia> function end_transaction()
# spawn a task to do some cleanup in the background, so we can
# return to the client immediately
Threads.@spawn begin
# clean stuff up
@info "Background Cleanup Task: All done cleaning up!"
for _ in 1:1000
@info "now it's clean, goodbye"
end
end
end
end_transaction (generic function with 1 method)
julia> using Test
julia> for _ in 1:10000
let (logs,_) = Test.collect_test_logs() do
@info "hey"
code_to_test()
@warn "hi"
end
for log in logs
@test length(log.message) > 1
end
end
end
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] iterate(A::Vector{LogRecord}, i::Int64)
@ Base ./array.jl:945
[3] top-level scope
@ REPL[5]:10
This also sometimes segfaults, of course:
[66213] signal (11.2): Segmentation fault: 11
in expression starting at REPL[5]:1
signal (11) thread (1) sig_match_fast at /Users/nathandaly/builds/julia-RAI/src/gf.c:2914 [inlined]
signal (11) thread (1) jl_lookup_generic_ at /Users/nathandaly/builds/julia-RAI/src/gf.c:2994 [inlined]
signal (11) thread (1) ijl_apply_generic at /Users/nathandaly/builds/julia-RAI/src/gf.c:3073
We see this showing up in our CI build farm from time to time.
The solution is almost certainly to make that vector thread-safe, by introducing a lock around it, and then copying the vector out before returning.