HDF5.jl icon indicating copy to clipboard operation
HDF5.jl copied to clipboard

Writing fixed width strings is tricky

Open david-macmahon opened this issue 4 years ago • 2 comments

While exploring Parallel HDF5 via HDF5.jl I discovered that parallel HDF5 does not support variable length strings (maybe any variable type?). To store fixed length strings, I made my own Datatype for HDF5 strings of length N (68 in my case) and then converted my Vector{String} to SVector{68, UInt8} by doing something like:

julia> dataset[1:2] = @. rpad(["abc","def"], 68) |> collect |> SVector{68, UInt8};

julia> dataset[1:2]
2-element Array{String,1}:
 "abc                                                                 "
 "def                                                                 "

That works, but it is a bit tricky/cumbersome and introduces a dependency on StaticArrays. Is there an easier way to do this using existing HDF5.jl functionality? Maybe involving HDF5.FixedString???

david-macmahon avatar Feb 16 '21 08:02 david-macmahon

Can you simply use?

c = "rannndomstring"
ntuple(i -> i <= ncodeunits(c) ? codeunit(c, i) : '\0', 68)

musm avatar Feb 16 '21 19:02 musm

That's better as it removes the dependency on StaticArrays (though have to change '\0' to UInt8(0)), but I was hoping for an existing function that does this. I discovered that I could create the Datatype by calling datatype(" "^68) so that helps too. I guess this can be considered a feature request and/or documentation update request. It's certainly not a show stopper. Thanks!

david-macmahon avatar Feb 16 '21 21:02 david-macmahon