fasterize icon indicating copy to clipboard operation
fasterize copied to clipboard

Rasterizing a factor field should yield a factor raster with appropriate RAT table (and related issues)

Open JoshOBrien opened this issue 5 years ago • 1 comments

At present fasterize() uses the input raster as a template, modifying many components of it @data slot (along with other slots) to form the raster that it returns. It does not, however, do anything to make sure that the @data@factor and @data@attributes slots -- the ones responsible for marking a raster as a factor and recording its lvels in a raster attribute table (RAT) -- are appropriately modified based on the type of polygon field used to rasterize them.

This leads to several undesirable and potentially surprising behaviors. For example:

  • if passed a numeric template raster, the output raster will always be numeric whether the field used to rasterize it is numeric or factor.

  • if passed a factor template raster, the function will always be a factor raster with the same RAT as the template raster, even if the field used to rasterize it is numeric.

  • if passed a factor template raster and a factor field, fasterize() will return a factor raster (as the user might expect) but one with the RAT of the template raster -- not the rasterizing field.

I am copying below a wrapper function that I have used for a while and that manages all the RAT information that fasterize() itself at present neglects to deal with. It ensures that:

  • numeric fields always return numeric rasters

  • factor fields always return factor rasters with RATs that include all levels of the field on which they were based

  • character fields return factor rasters with RATs that include as levels all unique character strings in the character field column.

(That last addresses Issue #24. I find that feature quite useful, but it is not as important as the first two bulleted items above.)

The longish listing below first demonstrates some of the undesirable behaviors of the current version of fasterize() and then demonstrates one way to address them, using a wrapper function I here call fasterize2() and which could potentially be used as a drop in replacement for the curent fasterize().

library(fasterize)
library(sf)
library(raster)
library(rasterVis)

##------------------------------------------------------------------------------
##  Example data
##------------------------------------------------------------------------------

##-----------##
##  Rasters  ##
##-----------##
set.seed(1)
## Numeric raster with values in 1:4
r_num <- raster(ncol = 20, nrow = 20)
r_num[] <- sample(1:4, size = ncell(r_num), replace = TRUE)
extent(r_num) <- extent(c(0, 20, 0, 20))
## Factor raster with 4 levels for landcover variable
r_fact <- as.factor(r_num)
rat <- levels(r_fact)[[1]]
rat[["landcover"]] <- c("oak","pine", "scrub", "grass")
levels(r_fact) <- rat

##----------##
## Polygons ##
##----------##
a <- list(rbind(c(2,8), c(5,14), c(8,8), c(2,8)))
b <- list(rbind(c(5,14), c(8,8), c(11,14), c(5,14)))
c <- list(rbind(c(11,14), c(14,14.5), c(14,13.5), c(11,14)))
pp <- st_sf(char = paste0("c_", 1:3),
            fact = factor(paste0("f_", 1:3)),
            num = 1:3,
            geometry = st_sfc(lapply(list(a, b, c), st_polygon)),
            crs = 4326)

## Plot
levelplot(r_fact, margin = FALSE,
          col.regions = rev(terrain.colors(4))) +
    layer(sp.polygons(as_Spatial(pp),
                      fill = blues9[c(3,6,9)],
                      col = NA))



##------------------------------------------------------------------------------
##  Several situations in which fasterize() performs less than ideally
##------------------------------------------------------------------------------

## (1) Numeric template raster + factor polygon attribute ------>
##     numeric raster
X <- fasterize(pp, r_num, field = "fact")
is.factor(X)

## (2) Factor template raster + numeric polygon attribute ------>
##     factor raster
X <- fasterize(pp, r_fact, field = "num")
is.factor(X)

## (3) Factor template raster + factor polygon attribute ------>
##     factor raster inheriting RAT from template raster, NOT polygon
##     attribute
X <- fasterize(pp, r_fact, field = "fact")
levels(r_fact)[[1]]
levels(X)[[1]]
pp$fact


##------------------------------------------------------------------------------
## Modified version of fasterize(), which outputs:
##  - factor rasters when passed character or factor polygon atttributes
##  - numeric rasters when passed numeric polygon attributes
##  - rasters whose RAT is based on polygon attribute, not input raster's RAT
##------------------------------------------------------------------------------

fasterize2 <- function (sf,
                        raster,
                        field = NULL,
                        fun = "last",
                        background = NA_real_,
                        by = NULL) {
  if (is.null(field)) {
        field <- names(sf)[1]
  }
  field_class <- class(sf[[field]])
  ## Convert character field to factor
  if (field_class == "character") {
    val <- sf[[field]]
    fac <- factor(val, levels = sort(unique(val)))
    sf[[field]] <- fac
  }
  ## Rasterize polygons
  out_raster <- .Call("_fasterize_fasterize", PACKAGE = "fasterize",
                      sf, raster, field, fun, background, by)
  ## Ensure that any RAT attached to output raster is derived from
  ## input char or factor field
  if(field_class %in% c("character", "factor")) {
    lev <- levels(sf[[field]])
    RAT <- data.frame(ID = seq_along(lev), VALUE = lev)
    names(RAT)[2] <- field
    ## Silence warning emitted when overwriting a RAT from input
    ## raster
    suppressWarnings(levels(out_raster) <- RAT)
  } else {
    out_raster@data@isfactor <- FALSE
    out_raster@data@attributes <- list()
  }
  out_raster
}


##------------------------------------------------------------------------------
##  Tests
##------------------------------------------------------------------------------

## Test fasterize2() with NUMERIC template raster
o_char1 <- fasterize2(pp, r_num, field = "char")
o_fact1 <- fasterize2(pp, r_num, field = "fact")
o_num1 <- fasterize2(pp, r_num, field = "num")
## Check results
is.factor(o_char1)
is.factor(o_fact1)
is.factor(o_num1)
levels(o_char1)
levels(o_fact1)
levels(o_num1)
levelplot(o_char1, col.regions = blues9[c(3,6,9)])
levelplot(o_fact1, col.regions = blues9[c(3,6,9)])
levelplot(o_num1, margin = FALSE)


## Test fasterize2() with FACTOR template raster
o_char2 <- fasterize2(pp, r_fact, field = "char")
o_fact2 <- fasterize2(pp, r_fact, field = "fact")
o_num2 <- fasterize2(pp, r_fact, field = "num")
## Check results by comparing with output of NUMERIC template raster
identical(o_char2, o_char1)
identical(o_fact2, o_fact1)
identical(o_num2, o_num1)

JoshOBrien avatar May 13 '19 05:05 JoshOBrien

@noamross Just pinging you once to ask whether you might welcome a PR incorporating these tweaks, or whether you've had a look at the above and are not so inclined.

Because the functionality added by the tweaks is so useful to me, I've included them in a function fasterizeDT in my own rasterDT package, but would of course rather make these changes upstream, if you think that makes sense.

JoshOBrien avatar Jul 21 '19 17:07 JoshOBrien