HDF5.jl
HDF5.jl copied to clipboard
Better storage of vectors of short strings
It would be more efficient to store large arrays of very short strings using a fixed length string type, since the overhead of each variable length string is many bytes.
Sounds fine to me. Presumably since IO is a bottleneck we could afford to do a certain amount of analysis: for strings that are not a leaf type, determine if a tighter type is possible, optimize packing, etc.