HDF5.jl icon indicating copy to clipboard operation
HDF5.jl copied to clipboard

trying to write BitArray{1} throws exception

Open ilia-kats opened this issue 3 years ago • 17 comments

Apparently, vectorizing a boolean operation returns a BitArray{1}. This cannot be written to a file:

file = h5open("test.h5", "r+")
file["test"] = 1:10 .< 5

results in

ERROR: MethodError: no method matching strides(::BitArray{1})
Stacktrace:
 [1] stride(::BitArray{1}, ::Int64) at ./abstractarray.jl:396
 [2] write_dataset(::HDF5.Dataset, ::HDF5.Datatype, ::BitArray{1}, ::HDF5.Properties) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1873 (repeats 2 times)
 [3] write_dataset(::HDF5.File, ::String, ::BitArray{1}; pv::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1549
 [4] write_dataset(::HDF5.File, ::String, ::BitArray{1}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1547
 [5] write(::HDF5.File, ::String, ::BitArray{1}; pv::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1592
 [6] write(::HDF5.File, ::String, ::BitArray{1}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1592
 [7] setindex!(::HDF5.File, ::BitArray{1}, ::String; pv::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:905
 [8] setindex!(::HDF5.File, ::BitArray{1}, ::String) at /home/ilia/.julia/packages/HDF5/iH4LA/src/HDF5.jl:891
 [9] top-level scope at none:1

ilia-kats avatar Mar 31 '21 15:03 ilia-kats

Strange, I wonder why strides(BitArray([1 0])) throws.

musm avatar Apr 13 '21 02:04 musm

Any updates on this? I'm trying to write a BitArray and for me it just crashes Julia, without even throwing any errors.

cossio avatar Nov 27 '22 14:11 cossio

Crashing Julia seems worse. Is there a stack trace even?

mkitti avatar Nov 27 '22 15:11 mkitti

I can't reproduce. Now I get the error reported above.. I think there might have been an OOM.

cossio avatar Nov 27 '22 15:11 cossio

Is there a particular array you want this to work? Would converting to a Vector{Bool} work or do you want a compact representation on disk as well?

mkitti avatar Nov 27 '22 19:11 mkitti

or do you want a compact representation on disk as well?

this. I'm saving a huge array and would like to compress as much as possible, both on disk and on memory when I load back the data.

cossio avatar Nov 27 '22 20:11 cossio

julia> v = BitVector([1,0,1,0,1,0,1,0])
8-element BitVector:
 1                                 0
 1
 0
 1
 0
 1
 0

julia> v.chunks
1-element Vector{UInt64}:
 0x0000000000000055

julia> v.dims
(498509808336,)

julia> v.len
8

julia> v.chunks[1] |> bitstring
"0000000000000000000000000000000000000000000000000000000001010101"

You could just try to serialize the components and reload them from there. I will have to do more research if there is a native way to handle this in HDF5 types.

mkitti avatar Nov 27 '22 20:11 mkitti

Could something like this be implemented natively in the package?

3f6a avatar Apr 01 '23 23:04 3f6a

Hello!

Still no work around for this? It would be nice to be able to achieve this

AhmedSalih3d avatar Apr 11 '24 18:04 AhmedSalih3d

I gave several workarounds above.

Could you convert this to a Vector{Bool} and then save it?

julia> function save_bit_vector(filename::String, v::BitVector)
           h5open(filename, "w") do h5f
               h5f["bitvector"] = Vector{Bool}(v)
           end
       end
save_bit_vector (generic function with 1 method)

julia> function load_bit_vector(filename::String)
           h5open(filename, "r") do h5f
               BitVector(h5f["bitvector"][])
           end
       end
load_bit_vector (generic function with 1 method)

julia> v = BitVector([true,true,false,true,true,true])
6-element BitVector:
 1
 1
 0
 1
 1
 1

julia> save_bit_vector("bitvector.h5", v)
6-element Vector{Bool}:
 1
 1
 0
 1
 1
 1

julia> load_bit_vector("bitvector.h5")
6-element BitVector:
 1
 1
 0
 1
 1
 1

mkitti avatar Apr 11 '24 20:04 mkitti

How about something like the following?

julia> function load_bit_vector(filename::String)
           h5open(filename, "r") do h5f
               chunks = h5f["bitvector"]
               len = attrs(h5f["bitvector"])["len"]
               v = BitVector(undef, len)
               v.chunks .= chunks
               return v
           end
       end
load_bit_vector (generic function with 1 method)

julia> function save_bit_vector(filename::String, v::BitVector)
           h5open(filename, "w") do h5f
               h5f["bitvector"] = v.chunks
               A = attrs(h5f["bitvector"])
               A["len"] = v.len
           end
       end
save_bit_vector (generic function with 1 method)

julia> v = BitVector([true,true,false,true,true,true])
6-element BitVector:
 1
 1
 0
 1
 1
 1

julia> save_bit_vector("bitvector.h5", v)
6

julia> load_bit_vector("bitvector.h5")
6-element BitVector:
 1
 1
 0
 1
 1
 1

julia> run(`$(HDF5_jll.h5dump()) bitvector.h5`);
HDF5 "bitvector.h5" {
GROUP "/" {
   DATASET "bitvector" {
      DATATYPE  H5T_STD_U64LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 59
      }
      ATTRIBUTE "len" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SCALAR
         DATA {
         (0): 6
         }
      }
   }
}
}

mkitti avatar Apr 11 '24 20:04 mkitti

I dug into this a little bit, and I a little bothered about what I found. There is a H5T_BITFIELD type class that seems like it would be appropriate for this application. The one standard type that is a bitfield if H5T_NATIVE_B8, which should be an 8-bit bitfield. For some reason, previous maintainers chose to map this to Bool as in https://github.com/JuliaIO/HDF5.jl/pull/540 .

Supposedly this was for compatibility with PyTables. However, the PyTables documentation currently states the following.

H5T_BITFIELD: This class is used to represent the Bool type. Such a type must be build using a H5T_NATIVE_B8 datatype, followed by a HDF5 H5Tset_precision call to set its precision to be just 1 bit.

julia> t = HDF5.API.h5t_copy(HDF5.API.H5T_NATIVE_B8)
216172782113784267

julia> HDF5.API.h5t_set_precision(t, 1)

julia> dt = HDF5.Datatype(t)
HDF5.Datatype: undefined integer
         size: 1 bytes
    precision: 1 bits
       offset: 0 bits
        order: little endian byte order

mkitti avatar Apr 11 '24 21:04 mkitti

Hi @mkitti !

Thanks for very quick answer. Yes, doing Vector{Bool} does work as you mention, thanks!

Unfortunately, for my use case the h5 file has to match the hdfvtk format and perhaps what you mention above is the issue, that it does not correctly convert the Bool?

Paraview which reads the hdfvtk file should be able to read (U)Int8 values etc.

I am just explaining where I hit the issue, in case someone else finds the thread later and wonders how

AhmedSalih3d avatar Apr 11 '24 21:04 AhmedSalih3d

What does hdfvtk expect?

mkitti avatar Apr 11 '24 21:04 mkitti

I am running a simulation right now, so can try to come back with exact details later, but I think it struggles to convert the logical true/false in Julia to a meaningful representation, where as it would probably be able to read the bitvector values if it is passsed to it as (U)Int8.

AhmedSalih3d avatar Apr 11 '24 21:04 AhmedSalih3d

What field are you trying to populate? You could convert your data to a Vector{UInt8} before saving as well. Then VTK would read an UInt8.

mkitti avatar Apr 11 '24 22:04 mkitti

What field are you trying to populate? You could convert your data to a Vector{UInt8} before saving as well. Then VTK would read an UInt8.

Great reply! This was exactly what I needed to do, and conveniently no need for any code changes other than that for me since UInt8(1) == true, is true in Julia.

AhmedSalih3d avatar Apr 12 '24 09:04 AhmedSalih3d