HDF5.jl
HDF5.jl copied to clipboard
HDF5.Dataset as an AbstractArray
Should HDF5.Dataset be an AbstractArray?
Alternatively, should we build a wrapper?
julia> struct DatasetArrayWrapper{T, N} <: AbstractArray{T,N}
parent::HDF5.Dataset
end
julia> DatasetArrayWrapper(parent::HDF5.Dataset) = DatasetArrayWrapper{eltype(parent), ndims(parent)}(parent)
DatasetArrayWrapper
julia> Base.size(w::DatasetArrayWrapper) = size(w.parent)
julia> Base.getindex(w::DatasetArrayWrapper, I...) = getindex(w.parent, I...)
julia> Base.getindex(w::DatasetArrayWrapper, I::Vararg{Int}) = getindex(w.parent, I...)
We almost have the entire AbstractArray interface implemented for HDF5.Dataset:
https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-array
How does the wrapper benefit over a direct subtyping of Dataset <: AbstractArray
Logically, makes sense that it would be a subtype. I think the original rationale might have been to limit it's interface. Are there any potential unintended side effects of this change?
We probably should implement AbstractDiskArray.jl. Perhaps stealing this, but querying more of the information rather than caching it like this implementation: https://github.com/AStupidBear/HDF5Utils.jl/blob/master/src/diskarrays.jl
Do you think it would be best to make it a direct subtype:
HDF5.Dataset <: AbstractDiskArray
vs. what HDF5Utils is doing where
HDF5DiskArray <: AbstractDiskArray
and then defining
function HDF5DiskArray(dset::HDF5Dataset)
...
Which I think they are only doing since they don't own the HDF5.Dataset type.
Yes, we should look into making it a direct subtype. This is a 0.17 topic though.
In contrast to the HDF5Utils.jl implementation, I think our implementation should be lazy.
Another issue if we make it a direct subtype is that we will need to parameterize it. We will need to consider how to mitigate the compilation latency in this case.
Please do subtype AbstractDiskArray. You can also just use the macros to apply the interface to you own types without subtyping, but subtyping is simpler for dispatch.
Myself and @meggart have wanted this to happen for quite a while, so if you have any problems feel free to ping.
I'm going to let this slip to 0.18 since 0.17 already contains other breaking changes.
Just linking my original implementation suggestion for a HDF5DiskArray: https://github.com/JuliaIO/HDF5.jl/issues/615
In case it helps.