HDF5.jl
HDF5.jl copied to clipboard
safe multi-threaded writing
Hi ya'll, I'm not sure where to put this so will dump in an issue for now in case it helps someone later. I needed a safe multi-threaded way to write data to one h5 file, so wrote one based on the thread-safe Dict tutorial at Actors.jl. At the moment it just support indexing, assignment, and closing, but can do other function calls like keys
by writing h5(keys)
for h5::HDF5server
.
# thread-safe write access to HDF5 using Actors
module HDF5ThreadSafe
using Actors, HDF5
import Actors: spawn
struct HDF5server{L}
lk::L
end
# support arbitrary functions.
(h5server::HDF5server)(f::Function, args...) = call(h5server.lk, f, args...)
(h5server::HDF5server)() = call(h5server.lk)
# indexing interface
Base.getindex(h5s::HDF5server, key) = call(h5s.lk, getindex, key)
Base.setindex!(h5s::HDF5server, value, key) = call(h5s.lk, setindex!, value, key)
Base.close(h5s::HDF5server) = call(h5s.lk, close)
# hdf5 server behavior
h5server(h5::HDF5.File, f::Function, args...) = f(h5, args...)
h5server(h5::HDF5.File) = show(h5)
# start hdf5 server (constructor)
function hdf5server(filename::AbstractString, mode::AbstractString="r"; swmr=false,
remote=false)
h5 = h5open(filename, mode; swmr=swmr)
HDF5server(spawn(h5server, h5; remote))
end
export HDF5server, hdf5server
end
Please feel free to close--not really an issue nor pull request.
Edit: here's a usage example:
using HDF5, Actors, HDF5ThreadSafe
import Base.Threads.@threads
h5path = joinpath("/tmp", "test.h5")
h5 = hdf5server(h5path, "w")
@threads for i in 1:10
h5["test$i"] = collect(i:i+5)
end
close(h5)
h5r = h5open(h5path, "r")
for k in keys(h5r)
println("$k: ", read(h5r[k]))
end
This looks pretty interesting, I think it would be great to incorporate this down the line here to allow thread safe usage of HDF5
Note that the HDF5 library optionally implements thread safety:
https://confluence.hdfgroup.org/plugins/servlet/mobile#content/view/48818684
Note that the HDF5 library optionally implements thread safety:
Yes that’s the better approach. Unfortunately has similar performance issues as HDF5 can be thread safe but not concurrent.
I switched over to Zarr for concurrency needs.
My current recommendation on multithreading write to a HDF5 file is to use memory mapping, which this package currently supports.
Fixed by #1021