HDF5.jl icon indicating copy to clipboard operation
HDF5.jl copied to clipboard

HDF5.Dataset as an AbstractArray

Open mkitti opened this issue 3 years ago • 7 comments

Should HDF5.Dataset be an AbstractArray?

Alternatively, should we build a wrapper?

julia> struct DatasetArrayWrapper{T, N} <: AbstractArray{T,N}
           parent::HDF5.Dataset
       end

julia> DatasetArrayWrapper(parent::HDF5.Dataset) = DatasetArrayWrapper{eltype(parent), ndims(parent)}(parent)
DatasetArrayWrapper

julia> Base.size(w::DatasetArrayWrapper) = size(w.parent)

julia> Base.getindex(w::DatasetArrayWrapper, I...) = getindex(w.parent, I...)

julia> Base.getindex(w::DatasetArrayWrapper, I::Vararg{Int}) = getindex(w.parent, I...)

We almost have the entire AbstractArray interface implemented for HDF5.Dataset: https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-array

mkitti avatar May 04 '22 21:05 mkitti

How does the wrapper benefit over a direct subtyping of Dataset <: AbstractArray

Logically, makes sense that it would be a subtype. I think the original rationale might have been to limit it's interface. Are there any potential unintended side effects of this change?

musm avatar May 18 '22 19:05 musm

We probably should implement AbstractDiskArray.jl. Perhaps stealing this, but querying more of the information rather than caching it like this implementation: https://github.com/AStupidBear/HDF5Utils.jl/blob/master/src/diskarrays.jl

mkitti avatar May 20 '22 09:05 mkitti

Do you think it would be best to make it a direct subtype:

HDF5.Dataset <: AbstractDiskArray 

vs. what HDF5Utils is doing where

HDF5DiskArray <: AbstractDiskArray

and then defining

function HDF5DiskArray(dset::HDF5Dataset)
...

Which I think they are only doing since they don't own the HDF5.Dataset type.

musm avatar May 20 '22 20:05 musm

Yes, we should look into making it a direct subtype. This is a 0.17 topic though.

In contrast to the HDF5Utils.jl implementation, I think our implementation should be lazy.

Another issue if we make it a direct subtype is that we will need to parameterize it. We will need to consider how to mitigate the compilation latency in this case.

mkitti avatar May 20 '22 20:05 mkitti

Please do subtype AbstractDiskArray. You can also just use the macros to apply the interface to you own types without subtyping, but subtyping is simpler for dispatch.

Myself and @meggart have wanted this to happen for quite a while, so if you have any problems feel free to ping.

rafaqz avatar Jun 07 '22 16:06 rafaqz

I'm going to let this slip to 0.18 since 0.17 already contains other breaking changes.

mkitti avatar Aug 30 '23 07:08 mkitti

Just linking my original implementation suggestion for a HDF5DiskArray: https://github.com/JuliaIO/HDF5.jl/issues/615

In case it helps.

meggart avatar Jan 12 '24 10:01 meggart