Rasters.jl icon indicating copy to clipboard operation
Rasters.jl copied to clipboard

`collect(skipmissing(::RasterStack))` fails

Open asinghvi17 opened this issue 1 year ago • 10 comments

# Prepare a test dataset
rs = RasterStack((Raster(rand(X(1:10), Y(1:10)); missingval=NaN) for _ in 1:4)...; missingval = NaN)
# Create areas of known zero, just for verification
map(rs) do A
    A[1:2, 1:2] .= 0
    A[3:4, 3:4] .= NaN
    A
end
rs = replace_missing(rs, missing)
rs[Extent(X=(3,4),Y=(3,4))] |> skipmissing |> collect
ERROR: Use iterate(layers(s)) rather than `iterate(s)`
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] iterate(::RasterStack{…})
   @ DimensionalData ~/.julia/dev/DimensionalData/src/stack/stack.jl:169
 [3] iterate(::Base.SkipMissing{RasterStack{…}})
   @ Base ./missing.jl:252
 [4] _collect(cont::UnitRange{Int64}, itr::Base.SkipMissing{RasterStack{…}}, ::Base.HasEltype, isz::Base.SizeUnknown)
   @ Base ./array.jl:770
 [5] collect(itr::Base.SkipMissing{RasterStack{…}})
   @ Base ./array.jl:759
 [6] |>(x::Base.SkipMissing{RasterStack{…}}, f::typeof(collect))
   @ Base ./operators.jl:917
 [7] top-level scope

asinghvi17 avatar Sep 26 '24 21:09 asinghvi17

I specifically want to iterate over namedtuples here.

asinghvi17 avatar Sep 26 '24 21:09 asinghvi17

Yeah AbstractDimStack doesn't iterate like that.

Maybe it should. For now you can iterate over (stack[I] for I in DimIndices(stack))

rafaqz avatar Sep 27 '24 06:09 rafaqz

This is fixed in the latest DD, so will be fixed here in the next breaking change

rafaqz avatar Nov 02 '24 18:11 rafaqz

Opps actually not sure SkipMissing will work, lets test that it does

rafaqz avatar Nov 02 '24 18:11 rafaqz

I think a custom method for iterating over skipmissing over a RasterStack (or even AbstractDimStack?) would be nice to have. Right now it iterates, but since it always iterates tuples, nothing is ever missing and it just iterates everything. I think the expected behaviour is to iterate over cells where none of the layers are missing(val), similar to boolmask.

Just changing one line in the base implementation of Base.iterate results in exactly this (we also need to fix the eltype, though).

import DimensionalData as DD
function Base.iterate(itr::Base.SkipMissing{<:DD.AbstractDimStack}, state...)
    y = iterate(itr.x, state...)
    y === nothing && return nothing
    item, state = y
    while any(ismissing, item) # ismissing(item)
        y = iterate(itr.x, state)
        y === nothing && return nothing
        item, state = y
    end
    item, state
end

tiemvanderdeure avatar Jul 10 '25 15:07 tiemvanderdeure

Let's just make it a kwarg, I can see situations where both options (none missing vs all missing vs some specific layers missing) are necessary.

asinghvi17 avatar Jul 10 '25 15:07 asinghvi17

I think the expected behaviour is to iterate over cells where none of the layers are missing(val), similar to boolmask.

Exactly, this should be the default. There is no way to actually get a missing out of the iterator

rafaqz avatar Jul 11 '25 01:07 rafaqz

Yeah currently skipmissing(rs) is the exact same as (x for x in rs). Not really ever useful.

The question is if we should check for any(ismissing, item), all(ismissing, item), or allow both. If we want to allow both, we need some keyword argument to skipmissing, which I guess means we need our own types and cannot just rely on a specialized dispatch on iterate.

I think we should just go for the implementation above to keep things simple. You can always do something like (x for x in rs if !all(ismissing, x)) if you need different behaviour

tiemvanderdeure avatar Jul 11 '25 08:07 tiemvanderdeure

I think DD should check for any missing and Rasters should check for any missingval

rafaqz avatar Jul 11 '25 09:07 rafaqz

yeah I'm PR'ing to DD now. And then we need a separate Rasters PR

tiemvanderdeure avatar Jul 11 '25 09:07 tiemvanderdeure