MemPool.jl
MemPool.jl copied to clipboard
Caching error on linux when removing cache files
Appears sometimes when process exits
IOError: unlink("/home/krynju/.mempool/sess-utvz1V-1/h2x1LD/jl_N2bctMjqbi"): no such file or directory (ENOENT)
Stacktrace:
[1] uv_error
@ ./libuv.jl:97 [inlined]
[2] unlink(p::String)
@ Base.Filesystem ./file.jl:972
[3] rm(path::String; force::Bool, recursive::Bool)
@ Base.Filesystem ./file.jl:283
[4] rm(path::String; force::Bool, recursive::Bool) (repeats 2 times)
@ Base.Filesystem ./file.jl:294
[5] (::MemPool.var"#203#206"{Int64})()
@ MemPool ~/.julia/packages/MemPool/Ggdm4/src/MemPool.jl:163
[6] _atexit()
@ Base ./initdefs.jl:372
This sounds like the rm(...; recursive=true)
call in our atexit
cleanup hook is racing with the eviction process; it's not technically possible to ensure that all files are cleaned up in time, so we could pass force=true
to ignore these errors, but that does make me feel slightly uncomfortable for unknown reasons. Thoughts?
FYI I have also sometimes seen this issue on WSL 2 Ubuntu when exiting Julia.
Also, it actually might be reproducible, as I've gotten this error three times in a row with the MWE in https://github.com/JuliaParallel/DTables.jl/issues/60#issuecomment-1808665528, but with enable_disk_caching!(50, 10^2 * 20)
(and I just realized my typo, I meant to do 2^10 * 20
) inserted after loading packages:
julia> include("mwe.jl")
julia> for i = 1:100 main() end
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
From worker 2: ┌ Info:
From worker 2: └ length(dt3) = 233930
ERROR: On worker 2:
AssertionError: Failed to migrate 183.839 MiB for ref 349
Stacktrace:
[1] #105
@ ~/.julia/packages/MemPool/l9nLj/src/storage.jl:887
[2] with_lock
@ ~/.julia/packages/MemPool/l9nLj/src/lock.jl:80
[3] #sra_migrate!#103
@ ~/.julia/packages/MemPool/l9nLj/src/storage.jl:849
[4] sra_migrate!
@ ~/.julia/packages/MemPool/l9nLj/src/storage.jl:826 [inlined]
[5] write_to_device!
@ ~/.julia/packages/MemPool/l9nLj/src/storage.jl:817
[6] #poolset#160
@ ~/.julia/packages/MemPool/l9nLj/src/datastore.jl:386
[7] #tochunk#139
@ ~/.julia/packages/Dagger/M13n0/src/chunks.jl:267
[8] tochunk (repeats 2 times)
@ ~/.julia/packages/Dagger/M13n0/src/chunks.jl:259 [inlined]
[9] #DTable#1
@ ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:38
[10] DTable
@ ~/.julia/packages/DTables/BjdY2/src/table/dtable.jl:28
[11] #create_dt_from_cols#9
@ ~/tmp/mwe.jl:76
[12] create_dt_from_cols
@ ~/tmp/mwe.jl:68 [inlined]
[13] update_value_col!
@ ~/tmp/mwe.jl:88
[14] query
@ ~/tmp/mwe.jl:27
[15] #invokelatest#2
@ ./essentials.jl:819 [inlined]
[16] invokelatest
@ ./essentials.jl:816
[17] #110
@ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
[18] run_work_thunk
@ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
[19] macro expansion
@ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
[20] #109
@ ./task.jl:514
Stacktrace:
[1] remotecall_fetch(::Function, ::Distributed.Worker; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:465
[2] remotecall_fetch(::Function, ::Distributed.Worker)
@ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:454
[3] #remotecall_fetch#162
@ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
[4] remotecall_fetch
@ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
[5] main
@ ~/tmp/mwe.jl:19 [inlined]
[6] top-level scope
@ ./REPL[2]:1
julia> # Exit Julia
┌ Warning: Worker 3 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/M13n0/src/sch/Sch.jl:529
┌ Warning: Worker 5 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/M13n0/src/sch/Sch.jl:529
┌ Warning: Worker 4 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/M13n0/src/sch/Sch.jl:529
From worker 2: IOError: unlink("/home/steven/.mempool/sess-Qsvl77-2/RHtbsR/jl_JWnIX2z29e"): no such file or directory (ENOENT)
From worker 2: Stacktrace:
From worker 2: [1]┌ Error: Fatal error on process 2
From worker 2: │ exception =
From worker 2: │ attempt to send to unknown socket
From worker 2: │ Stacktrace:
From worker 2: │ [1] error(s::String)
From worker 2: │ @ Base ./error.jl:35
From worker 2: │ [2] send_msg_unknown(s::Sockets.TCPSocket, header::Distributed.MsgHeader, msg::Distributed.ResultMsg)
From worker 2: │ @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/messages.jl:99
From worker 2: │ [3] send_msg_now(s::Sockets.TCPSocket, header::Distributed.MsgHeader, msg::Distributed.ResultMsg)
From worker 2: │ @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/messages.jl:115
From worker 2: │ [4] deliver_result(sock::Sockets.TCPSocket, msg::Symbol, oid::Distributed.RRID, value::Nothing)
From worker 2: │ @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:102
From worker 2: │ [5] macro expansion
From worker 2: │ @ ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:302 [inlined]
From worker 2: │ [6] (::Distributed.var"#113#115"{Distributed.CallWaitMsg, Distributed.MsgHeader, Sockets.TCPSocket})()
From worker 2: │ @ Distributed ./task.jl:514
From worker 2: └ @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:106
From worker 2: uv_error
From worker 2: @ ./libuv.jl:100 [inlined]
From worker 2: [2] unlink(p::String)
From worker 2: @ Base.Filesystem ./file.jl:972
From worker 2: [3] rm(path::String; force::Bool, recursive::Bool)
From worker 2: @ Base.Filesystem ./file.jl:283
From worker 2: [4] rm(path::String; force::Bool, recursive::Bool) (repeats 2 times)
From worker 2: @ Base.Filesystem ./file.jl:294
From worker 2: [5] rm
From worker 2: @ ./file.jl:273 [inlined]
From worker 2: [6] exit_hook()
From worker 2: @ MemPool ~/.julia/packages/MemPool/l9nLj/src/MemPool.jl:152
From worker 2: [7] _atexit(exitcode::Int32)
From worker 2: @ Base ./initdefs.jl:416
From worker 2: [8] exit
From worker 2: @ ./initdefs.jl:28 [inlined]
From worker 2: [9] exit()
From worker 2: @ Base ./initdefs.jl:29
From worker 2: [10] #invokelatest#2
From worker 2: @ ./essentials.jl:819 [inlined]
From worker 2: [11] invokelatest(::Any)
From worker 2: @ Base ./essentials.jl:816
From worker 2: [12] (::Distributed.var"#118#120"{Distributed.RemoteDoMsg})()
From worker 2: @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:308
From worker 2: [13] run_work_thunk(thunk::Distributed.var"#118#120"{Distributed.RemoteDoMsg}, print_error::Bool)
From worker 2: @ Distributed ~/programs/julia/julia-1.9.3/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
From worker 2: [14] (::Distributed.var"#117#119"{Distributed.RemoteDoMsg})()
From worker 2: @ Distributed ./task.jl:514
┌ Warning: Worker 2 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/M13n0/src/sch/Sch.jl:529
EDIT: I corrected my typo. Now I don't get the AssertionError
, but I still get the IOError
when exiting Julia.
The IOError
is generally harmless, the file will be removed one way or the other (if it doesn't, let me know!). The AssertionError
should be mostly "fixed" on master
, but we might need to be a bit more eager with freeing data to keep within the size bounds we've set.