HDF5.jl icon indicating copy to clipboard operation
HDF5.jl copied to clipboard

safe multi-threaded writing

Open tbenst opened this issue 3 years ago • 4 comments

Hi ya'll, I'm not sure where to put this so will dump in an issue for now in case it helps someone later. I needed a safe multi-threaded way to write data to one h5 file, so wrote one based on the thread-safe Dict tutorial at Actors.jl. At the moment it just support indexing, assignment, and closing, but can do other function calls like keys by writing h5(keys) for h5::HDF5server.

# thread-safe write access to HDF5 using Actors
module HDF5ThreadSafe

using Actors, HDF5
import Actors: spawn

struct HDF5server{L}
    lk::L
end
# support arbitrary functions. 
(h5server::HDF5server)(f::Function, args...) = call(h5server.lk, f, args...)
(h5server::HDF5server)() = call(h5server.lk)

# indexing interface
Base.getindex(h5s::HDF5server, key) = call(h5s.lk, getindex, key)
Base.setindex!(h5s::HDF5server, value, key) = call(h5s.lk, setindex!, value, key)
Base.close(h5s::HDF5server) = call(h5s.lk, close)

# hdf5 server behavior
h5server(h5::HDF5.File, f::Function, args...) = f(h5, args...)
h5server(h5::HDF5.File) = show(h5)

# start hdf5 server (constructor)
function hdf5server(filename::AbstractString, mode::AbstractString="r"; swmr=false,
    remote=false)
    h5 = h5open(filename, mode; swmr=swmr)
    HDF5server(spawn(h5server, h5; remote))
end

export HDF5server, hdf5server 

end

Please feel free to close--not really an issue nor pull request.

Edit: here's a usage example:

using HDF5, Actors, HDF5ThreadSafe
import Base.Threads.@threads

h5path = joinpath("/tmp", "test.h5")
h5 = hdf5server(h5path, "w")
@threads for i in 1:10
    h5["test$i"] = collect(i:i+5)
end
close(h5)

h5r = h5open(h5path, "r")
for k in keys(h5r)
    println("$k: ", read(h5r[k]))
end

tbenst avatar Jul 23 '21 17:07 tbenst

This looks pretty interesting, I think it would be great to incorporate this down the line here to allow thread safe usage of HDF5

musm avatar Oct 17 '21 23:10 musm

Note that the HDF5 library optionally implements thread safety:

https://confluence.hdfgroup.org/plugins/servlet/mobile#content/view/48818684

mkitti avatar Dec 17 '21 15:12 mkitti

Note that the HDF5 library optionally implements thread safety:

Yes that’s the better approach. Unfortunately has similar performance issues as HDF5 can be thread safe but not concurrent.

I switched over to Zarr for concurrency needs.

tbenst avatar Dec 17 '21 15:12 tbenst

My current recommendation on multithreading write to a HDF5 file is to use memory mapping, which this package currently supports.

mkitti avatar May 30 '22 15:05 mkitti

Fixed by #1021

simonbyrne avatar Jan 19 '23 05:01 simonbyrne