HDF5.jl
HDF5.jl copied to clipboard
freeze when `hdf5_type_id` on self-referential datatype
For example
using HDF5
struct A
x::A
end
HDF5.hdf5_type_id(A) # hangs
For a recursive type, the function will keep calling itself, leading to an infinite loop. I am not a specialist, but I think what Julia does internally is to store the field x
as a reference even though A
is immutable.
As far as I know, there can be no self-reference cycles across multiple structs, so it is easy to detect this within a single call of hdf5_type_id
. Maybe for now it is okay to just throw an error and fail gracefully in that case. I can implement that if wanted.
One practical example for this problem appearing is when trying to save Measurement
from Measurements.jl
. It contains a field of Measurements.Derivatives
which is self-referential.
I'm trying to figure out if HDF5 compound types support self-referential types like this.
When I had a short look, I couldn’t find it. If it doesn’t, one approach would be to wrap it in a variable length array type that stores all referenced copies together with indices that show their relation.
I just tried this patch and this does not seem to work either:
diff --git a/src/typeconversions.jl b/src/typeconversions.jl
index 1e82c659..a3f3a6b3 100644
--- a/src/typeconversions.jl
+++ b/src/typeconversions.jl
@@ -73,7 +73,8 @@ function hdf5_type_id(::Type{T}, isstruct::Val{true}) where {T}
dtype = API.h5t_create(API.H5T_COMPOUND, sizeof(T))
for (idx, fn) in enumerate(fieldnames(T))
ftype = fieldtype(T, idx)
- API.h5t_insert(dtype, Symbol(fn), fieldoffset(T, idx), hdf5_type_id(ftype))
+ _hdf5_type_id = ftype == T ? dtype : hdf5_type_id(ftype)
+ API.h5t_insert(dtype, Symbol(fn), fieldoffset(T, idx), _hdf5_type_id)
end
return dtype
end
I get the following:
julia> using HDF5
julia> struct A
x::A
end
julia> HDF5.hdf5_type_id(A)
ERROR: HDF5.API.H5Error: Error adding field x to compound datatype
libhdf5 Stacktrace:
[1] H5Tinsert: Invalid arguments to routine/Bad value
can't insert compound datatype within itself
- We could proceed with my patch which essentially allows the HDF5 library itself to error.
- We could just detect the condition and error directly. This may allow for us to provide a more direct and friendlier error message.
Is this even supported by Julia? How do you construct an instance of A
?
ah, you have to do it via inner constructors... the problem is that you end up with an undef
at some point, which I don't think we can really support.
julia> mutable struct A
x::A
function A()
self = new()
self.x = self
return self
end
end
julia> a = A()
A(A(#= circular reference @-1 =#))
julia> pointer_from_objref(a)
Ptr{Nothing} @0x000000770cacabc0
julia> pointer_from_objref(a.x)
Ptr{Nothing} @0x000000770cacabc0
julia> unsafe_load(Ptr{Ptr{Nothing}}(pointer_from_objref(a)))
Ptr{Nothing} @0x000000770cacabc0
For a mutable struct, this just turns into a bunch of pointers.
To store this kind of structure in HDF5 I think we would need some kind of analog to pointers. Either we store the array index to what is being pointed at or we use references some how.
That said I'm not sure if I can think of a way to do that generically.
Actually, I think there are non-obvious ways to create multi-type self-referential cycles that still freeze the patched hdf5_type_id
mutable struct B
x::Any
end
struct A
x::B
end
a = A(B(nothing))
a.x.x=a
So to catch everything, the types encountered so far all need to be kept track of. Serializing this kind of stuff generically seems quite challenging. Maybe it is possible using the HDF5-native reference system?
To store this kind of structure in HDF5 I think we would need some kind of analog to pointers. Either we store the array index to what is being pointed at or we use references some how.
One solution would be to only support isbitstype
s by default: any other Julia types would require some sort of manual conversion function.
I think this would be fine if it errored instead of hung. I think we need a type id cache to break cycles.
Self-referential structs are not isbitstype
so failing for non-isbitstype
s also removes any hangs.
My current solution is the following.
import HDF5: hdf5_type_id, API
function hdf5_type_id(::Type{T}, isstruct::Val{true}) where {T}
cache = try
task_local_storage(:hdf5_type_id_cache)::Dict{DataType,Int}
catch
task_local_storage(:hdf5_type_id_cache, Dict{DataType, Int}())
end
if haskey(cache, T)
error("Cannot create a HDF5 datatype with fields containing that datatype.")
end
dtype = API.h5t_create(API.H5T_COMPOUND, sizeof(T))
cache[T] = dtype
try
for (idx, fn) in enumerate(fieldnames(T))
ftype = fieldtype(T, idx)
_hdf5_type_id = hdf5_type_id(ftype)
API.h5t_insert(dtype, Symbol(fn), fieldoffset(T, idx), _hdf5_type_id)
end
catch err
rethrow(err)
finally
delete!(cache, T)
end
return dtype
end
julia> struct C{A}
x::A
end
julia> struct B{A}
x::C{A}
end
julia> struct A
x::B{A}
end
julia> HDF5.hdf5_type_id(A)
ERROR: Cannot create a HDF5 datatype with fields containing that datatype.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] hdf5_type_id(#unused#::Type{A}, isstruct::Val{true})
@ Main ./REPL[3]:8
[3] hdf5_type_id
@ ~/.julia/packages/HDF5/aiZLs/src/typeconversions.jl:71 [inlined]
[4] hdf5_type_id(#unused#::Type{C{A}}, isstruct::Val{true})
@ Main ./REPL[3]:15
[5] hdf5_type_id
@ ~/.julia/packages/HDF5/aiZLs/src/typeconversions.jl:71 [inlined]
[6] hdf5_type_id(#unused#::Type{B{A}}, isstruct::Val{true})
@ Main ./REPL[3]:15
[7] hdf5_type_id
@ ~/.julia/packages/HDF5/aiZLs/src/typeconversions.jl:71 [inlined]
[8] hdf5_type_id(#unused#::Type{A}, isstruct::Val{true})
@ Main ./REPL[3]:15
[9] hdf5_type_id(#unused#::Type{A})
@ HDF5 ~/.julia/packages/HDF5/aiZLs/src/typeconversions.jl:71
[10] top-level scope
@ REPL[13]:1
Any feedback on the above?
It seems pretty bullet proof to me. I’m not an expert on task_local_storage
, but also don’t see a simpler way to do without. In any case, it is much better than the current behavior.
Can we even write non-isbitstype
s? Why not just throw an error?
A mutable struct is not a bitstype. I'm not sure if we can write them yet, but I do think it would be possible in the future for us to write some simple mutable structs. If a mutable struct contains all bitstypes, then I think we can map it to a corresponding NamedTuple
which we could then write.
julia> struct Foo
x::Int
end
julia> mutable struct Bar
x::Int
end
julia> HDF5.Datatype(HDF5.hdf5_type_id(Foo))
HDF5.Datatype: H5T_COMPOUND {
H5T_STD_I64LE "x" : 0;
}
julia> HDF5.Datatype(HDF5.hdf5_type_id(Bar))
HDF5.Datatype: H5T_COMPOUND {
H5T_STD_I64LE "x" : 0;
}
julia> isbitstype(Foo)
true
julia> isbitstype(Bar)
false
julia> to_namedtuple(x::T) where T = NamedTuple{fieldnames(T)}(ntuple(i->getfield(x,i), fieldcount(T)))
to_namedtuple (generic function with 1 method)
julia> to_namedtuple(Bar(5))
(x = 5,)