HDF5.jl
HDF5.jl copied to clipboard
trying to write BitArray{1} throws exception
Apparently, vectorizing a boolean operation returns a BitArray{1}. This cannot be written to a file:
file = h5open("test.h5", "r+")
file["test"] = 1:10 .< 5
results in
ERROR: MethodError: no method matching strides(::BitArray{1})
Stacktrace:
[1] stride(::BitArray{1}, ::Int64) at ./abstractarray.jl:396
[2] write_dataset(::HDF5.Dataset, ::HDF5.Datatype, ::BitArray{1}, ::HDF5.Properties) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1873 (repeats 2 times)
[3] write_dataset(::HDF5.File, ::String, ::BitArray{1}; pv::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1549
[4] write_dataset(::HDF5.File, ::String, ::BitArray{1}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1547
[5] write(::HDF5.File, ::String, ::BitArray{1}; pv::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1592
[6] write(::HDF5.File, ::String, ::BitArray{1}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1592
[7] setindex!(::HDF5.File, ::BitArray{1}, ::String; pv::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:905
[8] setindex!(::HDF5.File, ::BitArray{1}, ::String) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:891
[9] top-level scope at none:1
Strange, I wonder why strides(BitArray([1 0]))
throws.
Any updates on this? I'm trying to write a BitArray and for me it just crashes Julia, without even throwing any errors.
Crashing Julia seems worse. Is there a stack trace even?
I can't reproduce. Now I get the error reported above.. I think there might have been an OOM.
Is there a particular array you want this to work? Would converting to a Vector{Bool}
work or do you want a compact representation on disk as well?
or do you want a compact representation on disk as well?
this. I'm saving a huge array and would like to compress as much as possible, both on disk and on memory when I load back the data.
julia> v = BitVector([1,0,1,0,1,0,1,0])
8-element BitVector:
1 0
1
0
1
0
1
0
julia> v.chunks
1-element Vector{UInt64}:
0x0000000000000055
julia> v.dims
(498509808336,)
julia> v.len
8
julia> v.chunks[1] |> bitstring
"0000000000000000000000000000000000000000000000000000000001010101"
You could just try to serialize the components and reload them from there. I will have to do more research if there is a native way to handle this in HDF5 types.
Could something like this be implemented natively in the package?
Hello!
Still no work around for this? It would be nice to be able to achieve this
I gave several workarounds above.
Could you convert this to a Vector{Bool}
and then save it?
julia> function save_bit_vector(filename::String, v::BitVector)
h5open(filename, "w") do h5f
h5f["bitvector"] = Vector{Bool}(v)
end
end
save_bit_vector (generic function with 1 method)
julia> function load_bit_vector(filename::String)
h5open(filename, "r") do h5f
BitVector(h5f["bitvector"][])
end
end
load_bit_vector (generic function with 1 method)
julia> v = BitVector([true,true,false,true,true,true])
6-element BitVector:
1
1
0
1
1
1
julia> save_bit_vector("bitvector.h5", v)
6-element Vector{Bool}:
1
1
0
1
1
1
julia> load_bit_vector("bitvector.h5")
6-element BitVector:
1
1
0
1
1
1
How about something like the following?
julia> function load_bit_vector(filename::String)
h5open(filename, "r") do h5f
chunks = h5f["bitvector"]
len = attrs(h5f["bitvector"])["len"]
v = BitVector(undef, len)
v.chunks .= chunks
return v
end
end
load_bit_vector (generic function with 1 method)
julia> function save_bit_vector(filename::String, v::BitVector)
h5open(filename, "w") do h5f
h5f["bitvector"] = v.chunks
A = attrs(h5f["bitvector"])
A["len"] = v.len
end
end
save_bit_vector (generic function with 1 method)
julia> v = BitVector([true,true,false,true,true,true])
6-element BitVector:
1
1
0
1
1
1
julia> save_bit_vector("bitvector.h5", v)
6
julia> load_bit_vector("bitvector.h5")
6-element BitVector:
1
1
0
1
1
1
julia> run(`$(HDF5_jll.h5dump()) bitvector.h5`);
HDF5 "bitvector.h5" {
GROUP "/" {
DATASET "bitvector" {
DATATYPE H5T_STD_U64LE
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): 59
}
ATTRIBUTE "len" {
DATATYPE H5T_STD_I64LE
DATASPACE SCALAR
DATA {
(0): 6
}
}
}
}
}
I dug into this a little bit, and I a little bothered about what I found. There is a H5T_BITFIELD
type class that seems like it would be appropriate for this application. The one standard type that is a bitfield if H5T_NATIVE_B8
, which should be an 8-bit bitfield. For some reason, previous maintainers chose to map this to Bool
as in https://github.com/JuliaIO/HDF5.jl/pull/540 .
Supposedly this was for compatibility with PyTables. However, the PyTables documentation currently states the following.
H5T_BITFIELD: This class is used to represent the Bool type. Such a type must be build using a H5T_NATIVE_B8 datatype, followed by a HDF5 H5Tset_precision call to set its precision to be just 1 bit.
julia> t = HDF5.API.h5t_copy(HDF5.API.H5T_NATIVE_B8)
216172782113784267
julia> HDF5.API.h5t_set_precision(t, 1)
julia> dt = HDF5.Datatype(t)
HDF5.Datatype: undefined integer
size: 1 bytes
precision: 1 bits
offset: 0 bits
order: little endian byte order
Hi @mkitti !
Thanks for very quick answer. Yes, doing Vector{Bool}
does work as you mention, thanks!
Unfortunately, for my use case the h5 file has to match the hdfvtk
format and perhaps what you mention above is the issue, that it does not correctly convert the Bool?
Paraview which reads the hdfvtk
file should be able to read (U)Int8 values etc.
I am just explaining where I hit the issue, in case someone else finds the thread later and wonders how
What does hdfvtk expect?
I am running a simulation right now, so can try to come back with exact details later, but I think it struggles to convert the logical true/false in Julia to a meaningful representation, where as it would probably be able to read the bitvector values if it is passsed to it as (U)Int8.
What field are you trying to populate? You could convert your data to a Vector{UInt8}
before saving as well. Then VTK would read an UInt8.
What field are you trying to populate? You could convert your data to a
Vector{UInt8}
before saving as well. Then VTK would read an UInt8.
Great reply! This was exactly what I needed to do, and conveniently no need for any code changes other than that for me since UInt8(1) == true, is true in Julia.