ArchGDAL.jl icon indicating copy to clipboard operation
ArchGDAL.jl copied to clipboard

Support PermutedDimsArray for interactive datasets

Open maxfreu opened this issue 4 years ago • 3 comments

Hi, I noticed the following:

img = AG.readraster("somefile");
typeof(img)
# ArchGDAL.RasterDataset{Int16, ArchGDAL.IDataset}
@time collect(img)  # fast, 0.6s

imgp = permutedims(img, (3,1,2));
typeof(imgp)
# DiskArrays.PermutedDiskArray{Int16, 3, PermutedDimsArray{Int16, 3, (3, 1, 2), (2, 3, 1), ArchGDAL.RasterDataset{Int16, ArchGDAL.IDataset}}}
# So underneath it's already a PermudedDimsArray somehow
@time collect(imgp) # fast 0.7s

imgpda = PermutedDimsArray(img, (3,1,2));
typeof(imgpda)
# PermutedDimsArray{Int16, 3, (3, 1, 2), (2, 3, 1), ArchGDAL.RasterDataset{Int16, ArchGDAL.IDataset}}
@time collect(imgpda) # super slow, 43s

So I assume the PermutedDimsArray translates to super inefficient disk access. As ArchGDAL already handles the interactive dataset lazily with permutedims I propose the super simple fix:

Base.PermutedDimsArray(A::RasterDataset{Any, IDataset}, perm) = permutedims(A, perm)

maxfreu avatar Aug 12 '21 09:08 maxfreu

Hmm, interesting.

Perhaps it's best to raise this issue with DiskArrays? cc @meggart

Over there permutedims is forwarded to PermutedDiskArray: https://github.com/meggart/DiskArrays.jl/blob/7fa75d4a0eb216d7e25a6718b223b8533d0b92c4/src/permute_reshape.jl#L64-L67

Probably Base.PermutedDimsArray should be forwarded to the same (like your proposed fix, but in DiskArrays).

visr avatar Aug 12 '21 10:08 visr

Probably Base.PermutedDimsArray should be forwarded to the same (like your proposed fix, but in DiskArrays).

This proposed fix does not work because the PermutedDiskArray already wraps a PermuteDimsArray, and just gives it disk-efficient access patterns, so there is something cyclic when you try to construct such a thing. One could solve this by adding some invoke calls, but in general I think it would also be strange if a constructor of PermutedDimsArray would return an object of a PermutedDiskArray.

What would speak against directly calling permutedims, which gives you the desired performance?

I really don't see another solution to this problem. If you wrap your array into a PermutedDiskArray, which is defined in Base, this is what you get and if this type assumes efficient random access to the underlying data it will be slow.

meggart avatar Aug 13 '21 15:08 meggart

What would speak against directly calling permutedims, which gives you the desired performance?

Not much. I just would have expected that it's equally fast. This finding resulted from trying to avoid eager loading from disk. Constructing a PermutedDimsArray is the canonical way to do things lazily (just like view for getindex), so I tried that first and failed. I thought maybe others might fall into that trap too, if they write "lazy" code. I leave it up to you to close this (and the other issue), because I have no further ideas how to handle this; returning a PermutedDiskArray sounds strange indeed.

maxfreu avatar Aug 13 '21 17:08 maxfreu