SimpleSDMLayers.jl icon indicating copy to clipboard operation
SimpleSDMLayers.jl copied to clipboard

✅ resizing one layer to the size of another, and ambiguity in `rescale` function meaning

Open gottacatchenall opened this issue 4 years ago • 20 comments

Description of the to-do item

As far as I can tell, there is not a function to take a raster and change its resolution to match the dimensions of a target raster. Doing this would be straightforward by iterating over the longitudes(a), latitudes(a) in the smaller raster and then access the values of the other raster at that coord, and have the getindex(b::SimpleSDMLayer, lat, long) handle that conversion.

When trying to do this I found rescale(layer1::T, layer2::T) where {T <: SimpleSDMLayer},which scales the values of a given raster to match a target, but not dimensions. To avoid this semantic ambiguity, perhaps the method rescale could rescale both values and dimensions by default with kw arguments to turn either off

gottacatchenall avatar Oct 24 '21 20:10 gottacatchenall

my hacky workaround atm is

function make_same_dims(input::IT, target::TT) where {IT,TT <: SimpleSDMLayer}
	lat, long = collect(longitudes(target)), collect(latitudes(target))
	newgrid = similar(target)
	for lt in lat, lg in long
		newgrid[lt,lg] = input[lt,lg]
	end
	newgrid
end

gottacatchenall avatar Oct 24 '21 21:10 gottacatchenall

That's a good point, but ideally something we might want to do with some sort of spatial interpolation. There's a way to get the haversine distance, do we can use this. I wouldn't necessarily call it rescale, maybe reproject? There's a little more thinking to do on this one before moving to a PR.

tpoisot avatar Oct 24 '21 22:10 tpoisot

FWIW, GeoData calls this "resample" and forwards the functionality to ArchGDAL.gdalwarp

mkborregaard avatar Oct 25 '21 11:10 mkborregaard

That sounds like something we might do -- my current rough draft is

using SimpleSDMLayers
using StatsPlots
using Statistics

ref = SimpleSDMPredictor(WorldClim, BioClim, 3; left=-20.0, right=40.0, top=60.0, bottom=30.0)

target = SimpleSDMResponse(zeros(Float32, 200, 200), -12.056, 37.065, 33.1234, 55.06)

for p in keys(target)
    if isnothing(ref[p])
        target[p] = nothing
    else
        vals = SimpleSDMLayers._sliding_values(ref, p, 20.0)
        target[p] = isempty(vals) ? nothing : mean(vals)
    end
end

diff overlap

@gottacatchenall , given the info @mkborregaard provided, do you want to see if a version based on ArchGDAL.gdalwarp is possible? It's already a dependency, and we can write/read temp geotiff files...

tpoisot avatar Oct 25 '21 17:10 tpoisot

@tpoisot Yeah GeoData.resample wraps it with arguments for interpolation rules, it isn't currently a dependency though

i've also written a few functions to take a set of SimpleSDMLayers and return PCA layers which i could include in the same PR

gottacatchenall avatar Oct 25 '21 20:10 gottacatchenall

Two PRs! Mostly because I'd like to do some solid work on the PCA, to make sure we can drop layers into multivariate stats pretty much directly.

tpoisot avatar Oct 25 '21 20:10 tpoisot

Also I wouldn't bring too much stuff from Geo until they figure out whether they use the Geometry types or not...

tpoisot avatar Oct 25 '21 20:10 tpoisot

@tpoisot i'm not sure if GeoData.jl is affiliated with JuliaGeo, here's its deps, which don't seem too heavy

also in progress on opening prs for both PCA and this, from what i can tell interfacing with Mvstats is pretty easy

gottacatchenall avatar Oct 25 '21 20:10 gottacatchenall

Re. GeoData that's still a lot of deps. I'm curious to see how far we can go using ArchGDAL only, since we depend on that no matter what.

tpoisot avatar Oct 25 '21 23:10 tpoisot

As far as I can tell resample only uses ArchGDAL functions.I could port it, but it would effectively be copy/pasting which isn't ideal.

I suppose long-term the julian-esque would be to move all functions that rely on ArchGDAL to a shared lib and have any other heavy dependencies loaded from higher level packages

Thoughts? @tpoisot @mkborregaard @rafaqz

gottacatchenall avatar Oct 26 '21 12:10 gottacatchenall

So I don't think it would make sense to take a GeoData dep just to get resample. GeoData is a general raster package, very similar to R's terra (and their old raster package), and as such has a lot of overlapping functionality with SimpleSDMLayers (such as masking, cropping, extracting data, downloading data, aggregating, opening raster files etc etc). GeoData is more focused on being a general raster package, though, whereas in my understanding SimpleSDMLayers is focused on creating a smooth user experience for fitting SDM models.

One could consider having SimpleSDMLayers depend on GeoData, which would give all the raster operations here for free, and I guess make it easier to maintain, but that would change the focus of the package to be explicitly on creating the smooth SDM interface. That depends completely on @tpoisot 's vision for this package.

I think most of the added deps for geodata (e.g. DiskArrays, Flatten, Mmap, ) relate to the functionality for working with on-disk rather than in-memory rasters.

When I originally posted here my suggestion was that you could be inspired (or essentially copy) GeoData's resample function, which should also be doable.

mkborregaard avatar Oct 26 '21 13:10 mkborregaard

I agree with that assessment @mkborregaard . Depending on GeoData.jl would be a larger change in philosophy and more like a decision to reduce the codebase here by half. GeoData generalizes many of the methods here, including resample.

Proper broadcasting, base array methods and the Tables.jl interface in GeoData.jl would also save work here. These things are hard to do properly.

So I wouldn't add a GeoData.jl dep just for resample. You can just copy code from GeoData.jl, and delete all of the generalization code. resample is just a wrapper to warp which is n-dimensionally generalised gdalwarp. Although @gottacatchenall I tend to agree that copy-pasting doesn't seem like a sustainable long-term strategy.

In terms of JuliaGeo, GeoData is no more affiliated than ArchGDAL is - its not a JuliaGeo package. Its my package, partly dog-fed by cesaraustralia but to do generic things better, not specifically to work with cesar packages.

It's also less tightly integrated with GeoInterface.jl than ArchGDAL is. I agree about the GeoInterface/GeometryBasics problem and will likely be involved in the solution in the coming months. But I don't think that's an insurmountable problem currently.

Many of the dependencies like DiskArrays/GeoInterface/ColorTypes/GeoFormatTypes are already ArchGDAL deps. HDF5 and NCDatasets are included because Requires.jl is not a great solution to handle dependency bounds, and we dont have proper glue packages yet. Please chip in there too if you would like the dependency problem resolved, I've used up my quota pushing that...

rafaqz avatar Oct 26 '21 13:10 rafaqz

Yeah for now I think copying warp will work

gottacatchenall avatar Oct 26 '21 14:10 gottacatchenall

Yep. GeoData.jl is not going anywhere if you want to cut some code and upgrade the tooling and functionality here later on. I just hope we can aim for integration rather than copy-paste as a long term goal.

rafaqz avatar Oct 26 '21 14:10 rafaqz

One could consider having SimpleSDMLayers depend on GeoData, which would give all the raster operations here for free, and I guess make it easier to maintain, but that would change the focus of the package to be explicitly on creating the smooth SDM interface. That depends completely on @tpoisot 's vision for this package.

Medium-term, I don't think GeoData will become a dependency of this package. As you'll see in the manuscript we're preparing (on network distribution models), SimpleSDMLayers does things in a different way and is, therefore, both less and more flexible depending on the use-case. Functionalities as they are currently cover our research needs, so don't expect changes in development direction soon.

The overlap in data download is because we couldn't agree on an interface, and that ship has sailed, meaning that there are no plans to remove these functionalities from SimpleSDMLayers in order to use RasterDataSources either.

tpoisot avatar Oct 26 '21 15:10 tpoisot

Great, yeah then it definitely makes more sense to just copy the implementation over :-)

mkborregaard avatar Oct 26 '21 16:10 mkborregaard

I suppose long-term the julian-esque would be to move all functions that rely on ArchGDAL to a shared lib and have any other heavy dependencies loaded from higher level packages

@gottacatchenall I have thought about this with warp... it could start by wrapping the arguments to gdalwarp actually in ArchGDAL.jl instead of in GeoData.jl. ArchGDAL.jl kind of already is that shared lib you're talking about. But it needs a Dict based front-end to GDAL arguments rather than the current Vector of String approach. What I've written in warp could really be moved there. It's just probably going to be a bigger job doing that consistently for the whole package than it is in GeoData.jl.

rafaqz avatar Oct 26 '21 19:10 rafaqz

Looks like most of the calls to GDAL.jl happen in this file. Seems that if we changed the ArchGDAL wrappers (unsafe_gdalwarp, etc.) to pass the options keyword to a function that converts from whatever the options input type is to the vector of strings required by GDAL.jl, then this could work

gottacatchenall avatar Oct 28 '21 20:10 gottacatchenall

I'm pretty sure this has been discussed as a thing ArchGDAL needs already, so a PR would probably be appreciated.

rafaqz avatar Nov 01 '21 08:11 rafaqz

Here's the code I'm using for a project where we need to do this:

function coerce(template::TT, source::TS, f) where {TS<:SimpleSDMLayer,TT<:SimpleSDMLayer}
    coerced = similar(template)
    for k in keys(template)
        coerced[k] = f(clip(source, k .- stride(template), k .+ stride(template)))
    end
    return coerced
end

There are a few possible adjustments to make, but I wouldn't be opposed to adding this to a new release.

tpoisot avatar Apr 08 '22 13:04 tpoisot