ITensors.jl
ITensors.jl copied to clipboard
Support for `h5read`
@emstoudenmire probably you have investigated this but I was interested in the current state of h5read when looking over the docs you moved over here: https://github.com/ITensor/ITensors.jl/pull/634.
Unfortunately, the more compact notation for reading using h5read doesn't work for some (maybe all?) of our types:
julia> using ITensors, ITensors.HDF5
julia> i = Index(2)
(dim=2|id=826)
julia> h5read("myfile.h5", "i" => Index)
ERROR: MethodError: no method matching h5read(::String, ::Pair{String, UnionAll})
Closest candidates are:
h5read(::Any, ::AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:487
h5read(::Any, ::Pair{var"#s53", DataType} where var"#s53"<:AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:500
h5read(::Any, ::AbstractString, ::Tuple{Vararg{Union{Colon, Int64, AbstractRange{Int64}}, N} where N}; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:513
Stacktrace:
[1] top-level scope
@ REPL[18]:1
julia> h5read("myfile.h5", "i" => Index{Int})
ERROR: MethodError: no method matching read(::HDF5.Group, ::Type{Index{Int64}})
Closest candidates are:
read(::Union{HDF5.Attribute, HDF5.Dataset}, ::Type{T}, ::Any...) where T at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1193
read(::Union{HDF5.File, HDF5.Group}, ::Pair{var"#s53", DataType} where var"#s53"<:AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1165
read(::Union{HDF5.File, HDF5.Group}, ::AbstractString; pv...) at /home/mfishman/.julia/packages/HDF5/iH4LA/src/HDF5.jl:1158
...
Stacktrace:
[1] h5read(filename::String, name_type_pair::Pair{String, DataType}; pv::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{0 Indices
, Tuple{}}})
@ HDF5 ~/.julia/packages/HDF5/iH4LA/src/HDF5.jl:505
[2] h5read(filename::String, name_type_pair::Pair{String, DataType})
@ HDF5 ~/.julia/packages/HDF5/iH4LA/src/HDF5.jl:501
[3] top-level scope
@ REPL[19]:1
There are two issues going on here. In the first issue, I believe it is more of a limitation on the HDF5.jl side, where Index is technically an abstract type Index{T} where {T} with typeof(Index) === UnionAll, while h5read only accepts a concrete type like Index{Int} with typeof(Index{Int}) === DataType. In the second issue, the problem is more on our side, since we don't allow specifying a concrete type in this case, so we could expand the definition here to accept a more general Type{<: Index}. However, this fix is still inconvenient, since we don't want people having to specify the particular concrete Index{Int} type.
The version without the explicit type runs:
julia> h5read("myfile.h5", "i")
Dict{String, Any} with 5 entries:
"dim" => 2
"id" => 0x1432306146b0bb12
"plev" => 0
"tags" => Dict{String, Any}("tags"=>"")
"dir" => 0
but of course doesn't output an Index, just some default data structure. It is difficult to find information about how to make this work for custom types, but my understanding from this comment is that you may need to define a custom file format. This makes a certain amount of sense, since then the file format can be determined and dispatched on, and without that there doesn't seem to be a way to tell HDF5.jl to do some special parsing of the files to search for ITensor-specific type information. @emstoudenmire, have you investigated this? That seems like the strategy the packages like JLD.jl take for saving and loading custom types.
Good catch that this doesn't work. I had in fact been wanting to move to a system where you don't have to specify the type at all, but instead when we write types to HDF5 it includes a string saying what the type is. This string would be eval'd when we read the data back in to generate an actual Julia type, at which point we call the type-specific version of h5read internally.
But you've definitely hit upon an issue that I was coming across, namely that we might want to write data of various closely related types such as Index{T} for various values of T. But then in a lot of cases it's not a priori clear how specific of a type we should put, and whether it should be a common abstract type like Index, or the exact type etc. Probably it should be the "exact enough" type, meaning enough to represent the data stored. So it's something we should discuss.
Yeah, in that case it seems like it could be a minimal amount of type information to reconstruct the original object. In general it isn't necessary to store Index{Int} since that can be deduced from what is stored in the file, just like the constructor Index(2) deduces it should be Index{Int}. A principle to use would be whatever minimal constructor call you would need in order to create the same object (i.e. Dense([1.0,2.0,3.0]) works so there is no need to store Dense{Float64} or something more complicated).
The issues I was more focusing on were:
- Can we directly use
i = h5read("myfile.h5", "i" => Index)or some variation instead of the longer notationf = h5open("myfile.h5", "r"); i = read(f,"i", Index); close(f)? It seems like the answer was no, not the wayHDF5.jlis currently written. In principle we could overloadHDF5.h5readto makei = h5read("myfile.h5", "i", Index)work but I don't see an equivalent version of that inHDF5.jlso it might be a bit sketchy to do that. - If we want to specifically overload the HDF5.jl function
HDF5.h5read("myfile.h5", "i"), I don't see how we can get that to deduce the type for us from a string inside the file (using anevalas you say) since it doesn't seem like there is a way to overload that behavior the wayHDF5.jlis written currently. So it seems to either require making a new function (sayITensors.h5reador whatever unique name) or inventing a new ITensor-specific file format likeh5itand useh5read("myfile.h5it", "i").
Also see the discussion here, a safer alternative to using eval would be to use a dictionary like Dict("ITensor" => ITensor, "Index" => Index, "Dense" => Dense, [...]). Presumably there are a finite number of types we will support for reading and writing which could get "registered" in this dictionary.
Glad you agree, and a related thought about which type to store is that the more abstract or general information that we put into the file, the more "forward compatible" or "future proof" it will be. So like e.g. if we decide to change some detail about whether the QN's are stored in a tuple or a vector (just making this up) and if that reflects in the type, not to have that break all previously written HDF5 files. But even if we do break compatibility, I had been thinking of and already implementing a versioning system where we put a version number and keep older reading-in code around if we detect older versions. It won't work perfectly but may allow some smooth upgrading across many versions.
To your point #2, you're right we couldn't really change the behavior of h5read("myfile.h5","i") without replacing or wrapping around the implementation in HDF5.jl, which would be a bad idea anyway. So probably the way to go there is a different function name that is introduced by ITensor, but also fairly general in its name. Something like h5get but ideally a better name than that.