dataRetrieval icon indicating copy to clipboard operation
dataRetrieval copied to clipboard

Revisit dependencies

Open ldecicco-USGS opened this issue 4 years ago • 3 comments

We chose readr initially because we thought data.table was a bigger learning curve. To simply convert to fread would be pretty easy and still save some time and reduce some (a lot?) of dependencies.

While this is explored, we could re-re-think the lubridate dependency.

ldecicco-USGS avatar Jan 22 '20 17:01 ldecicco-USGS

fread works great for WQP. Not as obviously great for RDB (those are the only 2 table readers we need). Here's the branch I'm working on: https://github.com/ldecicco-USGS/dataRetrieval/tree/fread

The reason I'm thinking about this is that CRAN is potentially going to limit the number of total dependencies a package can have (that's the rumor at least....). readr brings in a bunch of dependencies. data.table stands alone (and is mostly more efficient).

Right now though, fread doesn't have a comment.char argument....which makes getting the RDB format very difficult (since you need to both ignore the # lines and skip 2 lines).

So, I'll follow this Issue: https://github.com/Rdatatable/data.table/issues/856 because I think if that was implemented, we could make a switch.

In the meantime, we'll cross our fingers that the total dependency limit doesn't affect this (still a bit early to tell).

ldecicco-USGS avatar Feb 25 '20 22:02 ldecicco-USGS

It turns out, data.table ignores lines that start with "#" by default! So maybe, it's doable in the near term. In my fiddling though, it seems that if you use the default (ignore "#"), then you can't also use the "skip" with just a number (in our case, 2). If you use skip, it's starting from the beginning of the file (so, not ideal in our case, but not a deal breaker). However, it's fast enough that we could read the whole thing in as characters (which is how it seems to come in with our RDB formatting), use the top row to convert types...:

siteINFO <- readNWISsite('05114000')
url <- attr(siteINFO,"url")
colnames_data <- fread(url, 
                       header = TRUE,data.table = FALSE,
                       keepLeadingZeros = TRUE)

types <- unlist(colnames_data[1,])
types <- gsub("\\d", "", types) #this is R 4.0 regex!
types <- gsub("d", "Date", types)
types <- gsub("s", "character", types)
types[grepl("_va", names(types))] <- "numeric"
ret_df <- colnames_data[-1,]

ret_df[,types == "numeric"] <- sapply(ret_df[,types == "numeric"], as.numeric)

This code chunk actually gets us pretty far.... "just" (HA!) need to figure out dates. A lot of the date logic though will still be baked in the codes, so it might not be too much more work.

ldecicco-USGS avatar Apr 29 '20 13:04 ldecicco-USGS

httr2? https://httr2.r-lib.org/index.html

ldecicco-USGS avatar Sep 23 '21 13:09 ldecicco-USGS

Not so much that it's backlogged, but that it's a continues task that should be considered. I don't think it's worth the effort to convert to data.table unless we do a major overhaul or readr starts causing problems.

ldecicco-USGS avatar Jan 03 '24 15:01 ldecicco-USGS