JLD.jl icon indicating copy to clipboard operation
JLD.jl copied to clipboard

A simple stack corruption case with HDF5 reader and JLD writer

Open gloine opened this issue 9 years ago • 0 comments
trafficstars

I encountered a weird error while writing some julia code for my project and distilled the essence of it as the code below. What it does is basically to read data from a HDF5 file in a separate Task thread and save calculated results to another JLD file.

using JLD
using HDF5

h5open("a.hdf5", "w") do file
    a = reshape([1],1,1,1,1)
    file["data"] = a
end

function io_task_impl()
    while true
        file = h5open("a.hdf5", "r")
        produce((1,1,file["data"]))
        close(file)
        produce((1,0,nothing))
    end
end

io_task = Task(io_task_impl)
while true
    jldopen("aaa.jld", "w") do file
        write(file, "arglist", (Float64, Vector([28*28, 300, 10]), 100))
    end
    d = consume(io_task)
end

The code above produces the following error after a few loops:

HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 0:
  #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 352 in H5D__pre_write(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5Dio.c line 788 in H5D__write(): can't write data
    major: Dataset
    minor: Write failed
  #003: H5Dcontig.c line 580 in H5D__contig_write(): contiguous write failed
    major: Dataset
    minor: Write failed
  #004: H5Dscatgath.c line 678 in H5D__scatgath_write(): datatype conversion failed
    major: Dataset
    minor: Can't convert datatypes
  #005: H5T.c line 4816 in H5T_convert(): data type conversion failed
    major: Attribute
    minor: Unable to encode value
  #006: H5Tconv.c line 2571 in H5T__conv_struct_opt(): unable to convert compound datatype member
    major: Datatype
    minor: Unable to initialize object
  #007: H5T.c line 4816 in H5T_convert(): data type conversion failed
    major: Attribute
    minor: Unable to encode value
  #008: H5Tconv.c line 2172 in H5T__conv_struct(): not a datatype
    major: Datatype
    minor: Inappropriate type
ERROR: Error writing dataset
 in h5d_write at /Users/gloine/.julia/v0.5/HDF5/src/plain.jl:1928
 [inlined code] from /Users/gloine/.julia/v0.5/HDF5/src/plain.jl:1803
 in write_compound at /Users/gloine/.julia/v0.5/JLD/src/JLD.jl:699
 in write at /Users/gloine/.julia/v0.5/JLD/src/JLD.jl:687
 in write at /Users/gloine/.julia/v0.5/JLD/src/JLD.jl:509
 in anonymous at none:3
 in jldopen at /Users/gloine/.julia/v0.5/JLD/src/JLD.jl:245
 [inlined code] from /Users/gloine/.julia/v0.5/JLD/src/JLD.jl:243
 in anonymous at no file:0
 in eval at /Applications/Julia-0.5.0-dev-b0a84f7a3b.app/Contents/Resources/julia/lib/julia/sys.dylib

It sometimes segfaults, and sometimes gives me the error above depending on the code I insert in between. The code writes the same Tuple item to the JLD file every time, so it is weird to have a random failure.

I am using the latest master branch (could be several commits behind) on Mac OS X El Capitan. I read that HDF5 is not thread safe by default. Would installing a thread safe version of HDF5 solve the problem above?

Thanks, Gloine

gloine avatar Jan 25 '16 13:01 gloine