soilDB icon indicating copy to clipboard operation
soilDB copied to clipboard

fetchHenry() NA-padding for weekly / monthly granularity

Open dylanbeaudette opened this issue 2 years ago • 6 comments

TODO:

  • [ ] generalize .fill_missing_days() with new function and gran argument: days, weeks, months
  • [ ] adapt .formatDates() with new function
  • [ ] new argument to fetchHenry() for generic NA padding
  • [ ] updated docs and tutorial

Further research: https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates

First approximation here.


.fillMissingGran <- function(x, gran) {
  
  ## TODO this doesn't account for leap-years
  # 366 days
  # 53 weeks
  
  # sequence of possible values
  g.vect <- switch(
    gran,
    'day' = 1:365,
    'week' = 1:52,
    'month' = 1:12
  )
  
  # column to use
  # week / month_numeric are missing
  g.col <- switch(
    gran,
    'day' = 'doy',
    'week' = 'week',
    'month' = 'month_numeric'
  )
  
  # format string
  g.fmt <- switch(
    gran,
    'day' = '%Y %j %H:%M',
    'week' = '%Y %W %H:%M',
    'month' = '%Y %m %H:%M'
  )
  
  
  # add time ID columns as-needed
  # doi is always present
  
  ## "week" not as simple as it seems
  # https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates
  
  # week
  if(gran == 'week') {
    x$week <- as.integer(format(x$date_time, '%W'))
  }
  
  # month
  if(gran == 'month') {
    x$month_numeric <- as.integer(format(x$date_time, '%m'))
  }
  
  
  # ID missing time IDs
  missing <- which(is.na(match(g.vect, x[[g.col]])))
  
  # short-circuit
  if (length(missing) < 1) {
    return(x)
  }
  
  
  # make fake date-times for missing time IDs
  fake.datetimes <- paste0(x$year[1], ' ', missing, ' 00:00')
  
  # TODO: this will result in timezone specific to locale; 
  #  especially an issue when granularity is less than daily or for large extents
  fake.datetimes <- as.POSIXct(fake.datetimes, format = g.fmt)
  
  # generate DF with missing information
  fake.data <- data.frame(
    sid = x$sid[1],
    date_time = fake.datetimes, 
    year = x$year[1],
    doy = missing.days, 
    month = format(fake.datetimes, "%b")
  )
  
  fill.cols <- which(!colnames(x) %in% colnames(fake.data))
  if (length(fill.cols) > 0) {
    na.data <- as.data.frame(x)[, fill.cols, drop = FALSE][0,, drop = FALSE][1:nrow(fake.data),, drop = FALSE]
    fake.data <- cbind(fake.data, na.data)
  }
  
  # make datatypes for time match
  x$date_time <- as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
  
  # splice in missing data
  y <- rbind(x, fake.data)
  
  # re-order by DOY and return
  return(y[order(y$doy), ])
}




# generate example data
w <- fetchHenry(project = 'CA790', gran = 'week', soiltemp.summaries = FALSE, pad.missing.days = TRUE)

x <- w$soiltemp[w$soiltemp$sid == 392 & w$soiltemp$year == '1998', ]

plot(x$date_time, x$sensor_value, type = 'p')

.fillMissingGran(x, gran = 'week')

dylanbeaudette avatar Sep 13 '22 20:09 dylanbeaudette

A note to extend methods where possible so that they can work with other data sources e.g. SCAN, CDEC

brownag avatar Oct 02 '22 15:10 brownag

Looks like we will also need to change the usage of base::as.POSIXct() format argument in soilDB:::.fill_missing_days() as it is breaking with R devel.

══ Failed tests ════════════════════════════════════════════════════════════════
── Error (test-fetchHenry.R:122:3): summarizeSoilTemperature() works as expected ──
Error in `.POSIXct(x, tz, ...)`: unused argument (format = "%Y-%m-%d %H:%M:%S")
Backtrace:
    ▆
 1. ├─soilDB:::.formatDates(x, gran = "day", pad.missing.days = TRUE) at test-fetchHenry.R:122:2
 2. │ ├─...[]
 3. │ └─data.table:::`[.data.table`(...)
 4. └─soilDB:::.fill_missing_days(.SD)
 5.   ├─base::as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
 6.   └─base::as.POSIXct.default(x$date_time, format = "%Y-%m-%d %H:%M:%S")
── Error (test-fetchHenry.R:165:3): .fill_missing_days() works as expected ─────
Error in `.POSIXct(x, tz, ...)`: unused argument (format = "%Y-%m-%d %H:%M:%S")
Backtrace:
    ▆
 1. └─soilDB:::.fill_missing_days(x) at test-fetchHenry.R:165:2
 2.   ├─base::as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
 3.   └─base::as.POSIXct.default(x$date_time, format = "%Y-%m-%d %H:%M:%S")

brownag avatar Oct 04 '22 19:10 brownag

I'll try to take a look next week sometime, unless you have time before then. Can you tackle the POSIX thing?

dylanbeaudette avatar Oct 04 '22 21:10 dylanbeaudette

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Can you tackle the POSIX thing?

This is sorted w/ https://github.com/ncss-tech/soilDB/commit/6d4c02b553b52f67ffd4b0da9d8ae15c2c9ad0f4 as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

brownag avatar Oct 04 '22 21:10 brownag

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Go for it if you have some time. I'm not going to have enough time this week.

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

Thanks, the as.Date( fix was news to me.

dylanbeaudette avatar Oct 05 '22 20:10 dylanbeaudette

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Go for it if you have some time. I'm not going to have enough time this week.

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

Thanks, the as.Date( fix was news to me.

dylanbeaudette avatar Oct 05 '22 20:10 dylanbeaudette