terra icon indicating copy to clipboard operation
terra copied to clipboard

spatSample won't create the number of sample requested for each stratum

Open fernandabiologia opened this issue 3 years ago • 4 comments

Hi,

I'm trying to use spatSample to create a survey design that generate random points for each category of my raster:

pred_recl class : SpatRaster dimensions : 1686, 1872, 1 (nrow, ncol, nlyr) resolution : 0.0025, 0.0025 (x, y) extent : 143.8175, 148.4975, -43.7425, -39.5275 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +ellps=GRS80 +no_defs source : memory name : stratum min value : 1 max value : 3

set.seed(1) s <- spatSample(pred_recl, size = 20, method = "stratified", as.points=TRUE, xy=TRUE)

But when I run spatSample() I get the Warning messages: 1: [spatSample] fewer cells returned than requested 2: [sample] fewer samples than requested for strata: 2, 3

All categories are big enough to easily fit 20 points, so I don't understand why it doesn't create them.

To test if the issue was with my raster, I read my data with the raster package and tried the function sampleStratified (x = pred_recl, size = 20, xy = TRUE, sp = TRUE). This works just fine, so I thought I'd open an issue as it seems there is a problem with the spatSample function.

Thanks,

Fernanda.

fernandabiologia avatar Feb 28 '22 06:02 fernandabiologia

I think I have a related issue. When I use spatSample(method = "stratified) with a raster with one stratum that only covers a small area I get an error. If I run the same with method = "random" and na.rm = TRUE there is no error but also no samples. Everything works as expected if the area is larger. There are 25 cells in the stratum and I am only looking for 2 samples so I would expect it to be possible.

I tried with the CRAN release and the development version from R universe.

library(terra)

rr <- rast(ncol=100, nrow=100, names="stratum")

rr[6:10, 6:10] <- 1

# returns an error
spatSample(rr, 2, "stratified")
# Error in `colnames<-`(`*tmp*`, value = c("cell", names(x))) : 
#   attempt to set 'colnames' on an object with less than two dimensions
# In addition: Warning message:
#   [spatSample] fewer cells returned than requested 

# No error but no samples either
spatSample(rr, size = 2, 
                  method = "random", na.rm = TRUE)
# [1] stratum
# <0 rows> (or 0-length row.names)
# Warning message:
#   [spatSample] fewer values returned than requested

# works fine if the patch is big enough
rr <- rast(ncol=100, nrow=100, names="stratum")

rr[6:30, 6:30] <- 1

spatSample(rr, 2, "stratified")

spatSample(rr, size = 2, 
           method = "random", na.rm = TRUE)

see24 avatar Mar 25 '22 20:03 see24

Running into this again and going to try and contribute. I have two ideas one is that if no values are sampled in a stratum try setting other values to NA and then trimming so that if the stratum is in a clump will be less likely to miss it. This would work for my use case but perhaps not many others? The other idea is use the notfound flag that is recorded to redo samples for those strata with a larger sample size.

Am I correct in understanding that the current approach is just to take a random sample that is 5 times larger than the desired sample size times the number of strata and then keep enough samples to match the sample size and warn if that was not reached?

see24 avatar May 17 '22 19:05 see24

Hello, Do you know if this issue has been looked into? I have a similar problem to the one described by @fernandabiologia but with the "random" sampling method.

The input raster I want to sample the locations from has 178,276 cells that are not NA and I want to randomly select 1% of them but the following command only returns 10 locations.

potential.ignition.points <- spatSample(input.raster, size = 180, method = "random", na.rm = TRUE, as.points = TRUE, as.df = FALSE, ext = ext(input.raster))

input.raster class : SpatRaster dimensions : 4867, 4349, 1 (nrow, ncol, nlyr) resolution : 100, 100 (x, y) extent : -1611382, -1176482, -3930115, -3443415 (xmin, xmax, ymin, ymax) coord. ref. : GDA94 / MGA zone 55 (EPSG:28355) source : memory name : layer min value : 1 max value : 1

Many thanks, Amelie

AmelieJeanneau avatar Jun 22 '22 05:06 AmelieJeanneau

@AmelieJeanneau The pull request that I created might help with that. If you want to try it you could install from my fork here https://github.com/see24/terra. I just fetched the changes from terra so it should be up to date. I haven't heard anything on this issue or my pull request

see24 avatar Jun 22 '22 14:06 see24

I believe these issues have now been solved; or at least partly. Feel free to open a new issue if you run into this problem again, and please include a reproducible example if possible.

rhijmans avatar Sep 04 '22 19:09 rhijmans

Hello @rhijmans,

I am experiencing this problem. I am trying to sample different numbers of pixels randomly, but each time it returns a smaller number than asked.

Unfortunately github won't accept a .tif file, but here is the line of code I used:

samp = spatSample(x, size = 4000, method = 'random', as.points=TRUE, na.rm = TRUE)

Additionally, I would like to generate a regular sample, with pixels chosen every 9 km. However, I only want pixels with values (ignoring NAs). When I set method = 'random', it returns mostly NAs (as there are many NAs in my file). Is it possible to sample regular by perhaps sampling the nearest pixel with a value to the NA pixel that would be chosen if strictly following a regular sampling design?

matthewseanmarcus avatar Nov 01 '22 19:11 matthewseanmarcus

Thanks. Can you please add two new issues? One to report a possible bug, and please be as specific as possible, by including a reproducible example that recreates this problem. Or, at the very least, include the result of show(x) and global(x, "notNA"). The other one would be for a feature request (a new way to get a regular sample).

rhijmans avatar Nov 02 '22 04:11 rhijmans

Thanks @rhijmans

I think the problem was simply a matter of not using the latest version of terra. I updated the package and now the issue appears to be resolved. I just added an issue for the regular sampling feature.

Thanks!

matthewseanmarcus avatar Nov 02 '22 15:11 matthewseanmarcus

Dear rhijmans, I meet this problem again.

a1 = terra::spatSample(x1, 2000, as.df=T, replace=T, na.rm=T, xy=T)
#Warning message:
#[spatSample] fewer cells returned than requested 

x1
#class       : SpatRaster 
#dimensions  : 24093, 14980, 3  (nrow, ncol, nlyr)
#resolution  : 30, 30  (x, y)
#extent      : 110850, 560250, 3077450, 3800240  (xmin, xmax, ymin, ymax)
#coord. ref. : WGS 84 / UTM zone 48N (EPSG:32648) 
#sources     : spat_lvzsihE3Db1Mjos_4516.tif  
#              bufdem30mutm.tif  
#              aspectcos1.tif  
#names       : bamboo21,      elev, aspect 
#min values  :     TRUE,  218.7693,     -1 
#max values  :     TRUE, 7468.8027,      1 

global(x1, "notNA")
#             notNA
#bamboo21   2979387
#elev     146837026
#aspect   146677365

I set size=2000, but, I can only get 90 cells' value, I update terra to 1.6-43, but still get such result.

PetiteTong avatar Nov 22 '22 02:11 PetiteTong

@PetiteTong I suspect that the reason for this is that only very few cells are not NA for all three layers. For "bamboo21" it is less than 1%.

2979387 / (24093 * 14980)
#[1] 0.008255136

Perhaps the number of cells where all three layers are not NA is 90? You can find out with

global(anyNA(x1), sum)

If there are more than 90 cells you could also try to increase argument exp

rhijmans avatar Nov 22 '22 03:11 rhijmans

thanks, I tried,

global(anyNA(x1), sum)
#           sum                            
#lyr1 357933753

maybe what we need is any notNA?

PetiteTong avatar Nov 22 '22 04:11 PetiteTong

It is indirect, but by subtracting that number from the number of cells, we can see that where is data in bamboo2 there is data in the other layers:

 (24093 * 14980) - 357933753
#[1] 2979387

They are just very hard to find when the values are so sparse. I can add a separate method for this (feel free to add a new issue so that I do not forget). For now, you could sample repeatedly and combine the samples, If you keep the cell numbers you can get a sample without replacement by running unique.

rhijmans avatar Nov 22 '22 16:11 rhijmans

Ok, thanks a lot~!

PetiteTong avatar Nov 23 '22 01:11 PetiteTong