software-review
software-review copied to clipboard
Submission tsbox: Class-Agnostic Time Series
Submitting Author Name: Christoph Sax Submitting Author Github Handle: @christophsax Repository: https://github.com/christophsax/tsbox Version submitted: 3.1.9001 Submission type: Stats Badge grade: silver Editor: @rkillick Reviewers: @chamberlinc, @brunaw, @nunesmatt
Due date for @chamberlinc: 2021-11-23Due date for @brunaw: 2021-11-23 Due date for @nunesmatt: 2022-10-03 Archive: TBD Version accepted: TBD
- Paste the full DESCRIPTION file inside a code block below:
Package: tsbox
Type: Package
Title: Class-Agnostic Time Series
Version: 0.3.1.9001
Authors@R: person("Christoph", "Sax", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-7192-7044"))
Description: Time series toolkit with identical behavior for all
time series classes: 'ts','xts', 'data.frame', 'data.table', 'tibble', 'zoo',
'timeSeries', 'tsibble', 'tis' or 'irts'. Also converts reliably between these classes.
Imports:
data.table (>= 1.10.0),
anytime
Suggests:
testthat,
dplyr,
tibble,
tidyr,
forecast,
seasonal,
dygraphs,
xts,
ggplot2,
scales,
knitr,
rmarkdown,
tsibble (>= 0.8.2),
tsibbledata,
tibbletime,
tseries,
units,
zoo,
tis,
timeSeries,
nycflights13,
imputeTS,
spelling
License: GPL-3
Encoding: UTF-8
URL: https://www.tsbox.help, https://github.com/christophsax/tsbox
BugReports: https://github.com/christophsax/tsbox/issues
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
VignetteBuilder: knitr
Depends:
R (>= 2.10)
Config/testthat/parallel: true
Config/testthat/edition: 3
Language: en-US
Pre-submission Inquiry
- [X] A pre-submission inquiry has been approved in issue#457
General Information
- Who is the target audience and what are scientific applications of this package?
Anyone who works with time series. Many statistical packages require time series to be in a certain object (ts, xts, tsibble, data.frame). tsbox facilitates the conversion between these objects. It also provides a general toolkit that works the same way with all time series classes. {tsbox} is also mentioned in the rOpenSci Statistical Software Peer Review Section on Time Series.
-
Paste your responses to our General Standard G1.1 here, describing whether your software is:
- The first implementation of a novel algorithm; or
- The first implementation within R of an algorithm which has previously been implemented in other languages or contexts; or
- An improvement on other implementations of similar algorithms in R.
In the rOpenSci classification, this package is An improvement on other implementations of similar algorithms in R. Many time series packages, e.g., zoo or tsibble contain converter functions from one class to another. They often convert from their class to ts objects and back, but lack converters to other time series class.
In most cases, tsbox transforms an object into an augmented data.table. And uses the data.table infrastructure for efficient joining and reshaping. After computation, it restores the original input class. This restoring feature is
was also used in the xts::reclass() function of the xts package.
- Please include hyperlinked references to all other relevant software.
data.table: For efficient joining and reshaping xts: Similar reclassing mechanism tsibble: Tidy Temporal Data Frames and Tools
- (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Not applicable.
Badging
- What grade of badge are you aiming for? (bronze, silver, gold)
I probably need some advice on this. I think that tsbox complies with most standards that are applicable. Generality of usage is a particular feature that should qualify the package for silver.
- If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)
The most outstanding point is probably the one on generality:
Have a demonstrated generality of usage beyond one single envisioned use case.
The package facilitates work with time series in general. It can also be used to ease the burden of object testing and time series conversion for other time series packages.
Technical checks
Confirm each of the following by checking the box.
- [X] I/we have read the guide for authors and rOpenSci packaging guide.
- [X] I/we have read the Statistical Software Peer Review Guide for Authors.
- [X] I/we have run
autotestchecks on the package, and ensured no tests fail. - [X] The
srr_stats_pre_submit()function confirms this package may be submitted.
This package:
- [X] does not violate the Terms of Service of any service it interacts with.
- [X] has a CRAN and OSI accepted license.
- [X] contains a README with instructions for installing the development version.
Publication options
-
[X] Do you intend for this package to go on CRAN?
Already published on CRAN.
-
[ ] Do you intend for this package to go on Bioconductor?
Code of conduct
- [X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.
:rocket:
Editor check started
:wave:
Oops, something went wrong with our automatic package checks. Our developers have been notified and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.
@christophsax Sorry about the mess up here - the checks server is currently running R4.0.3, so doesn't recognise native pipes which you use in some test code. Will upgrade asap and deliver your checks.
But I did't intend to use native pipes, that was an accident. I will remove them right now.
I did so now. Perhaps rerunning is sufficient.
Checks for tsbox (v0.3.1.9001)
git hash: aad57cea
- :heavy_check_mark: Package uses 'roxygen2'.
- :heavy_check_mark: Package has a 'contributing.md' file.
- :heavy_check_mark: Package has a 'CITATION' file.
- :heavy_check_mark: Package has a 'codemeta.json' file.
- :heavy_check_mark: All functions have examples.
- :heavy_check_mark: Package has at least one HTML vignette
- :heavy_check_mark: Package 'DESCRIPTION' has a URL field.
- :heavy_check_mark: Package 'DESCRIPTION' has a BugReports field.
- :heavy_check_mark: Package is already on CRAN.
- :heavy_check_mark: Package has continuous integration checks.
- :heavy_multiplication_x: Package coverage failed
- :heavy_multiplication_x: R CMD check found 1error.
- :heavy_check_mark: R CMD check found no warnings.
- :heavy_check_mark: All applicable standards [v0.0.1] have been documented in this package.
Important: All failing checks above must be addressed prior to proceeding
Package License: GPL-3
1. rOpenSci Statistical Standards (srr package)
This package is in the following category:
- Time Series
:heavy_check_mark: All applicable standards [v0.0.1] have been documented in this package
Click here to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in R (100% in 53 files) and
- 1 authors
- 3 vignettes
- no internal data file
- 2 imported packages
- 63 exported functions (median 7 lines of code)
- 277 non-exported functions in R (median 9 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:
loc= "Lines of Code"fn= "function"exp/not_exp= exported / not exported
The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
| measure | value | percentile | noteworthy |
|---|---|---|---|
| files_R | 53 | 96.0 | |
| files_vignettes | 3 | 90.8 | |
| files_tests | 44 | 98.7 | |
| loc_R | 2649 | 88.3 | |
| loc_vignettes | 344 | 80.6 | |
| loc_tests | 1951 | 92.1 | |
| num_vignettes | 3 | 93.1 | |
| n_fns_r | 340 | 93.4 | |
| n_fns_r_exported | 63 | 90.4 | |
| n_fns_r_not_exported | 277 | 93.8 | |
| n_fns_per_file_r | 4 | 51.4 | |
| num_params_per_fn | 1 | 1.1 | TRUE |
| loc_per_fn_r | 8 | 29.0 | |
| loc_per_fn_r_exp | 7 | 14.2 | |
| loc_per_fn_r_not_exp | 9 | 41.9 | |
| rel_whitespace_R | 25 | 91.6 | |
| rel_whitespace_vignettes | 32 | 87.8 | |
| rel_whitespace_tests | 28 | 98.0 | TRUE |
| doclines_per_fn_exp | 41 | 51.1 | |
| doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
| fn_call_network_size | 475 | 94.6 |
2a. Network visualisation
Interactive network visualisation of calls between objects in package can be viewed by clicking here
3. goodpractice and other checks
Details of goodpractice and other checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
| name | conclusion | sha | date |
|---|---|---|---|
| R-CMD-check | success | aad57c | 2021-09-18 |
3b. goodpractice results
R CMD check with rcmdcheck
R CMD check generated the following error:
- checking tests ...
Running ‘spelling.R’
Running ‘testthat.R’
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
█
- ├─fl[i]
- │ └─tsbox::ts_apply(x, ff, ...)
- │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
- │ ├─x[, fun(.SD, ...), by = eval(.by)]
- │ └─data.table:::
[.data.table(x, , fun(.SD, ...), by = eval(.by)) - └─tsbox:::fun(.SD, ...)
- └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...)
-
├─seasonal::final(seasonal::seas(x, ...)) -
└─seasonal::seas(x, ...) -
└─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE)
[ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ] Error: Test failures Execution halted
R CMD check generated the following test_fail:
-
library(testthat) Warning message: package 'testthat' was built under R version 4.1.0
library(tsbox)
test_check("tsbox") Starting 2 test processes ══ Skipped tests ═══════════════════════════════════════════════════════════════ • On CRAN (17)
══ Failed tests ════════════════════════════════════════════════════════════════ ── Error (test-units.R:41:5): tsbox works with units ─────────────────────────── Error: Process terminated Backtrace: █
- ├─fl[i]
- │ └─tsbox::ts_apply(x, ff, ...)
- │ └─tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
- │ ├─x[, fun(.SD, ...), by = eval(.by)]
- │ └─data.table:::
[.data.table(x, , fun(.SD, ...), by = eval(.by)) - └─tsbox:::fun(.SD, ...)
- └─(function(x, ...) seasonal::final(seasonal::seas(x, ...)))(...)
-
├─seasonal::final(seasonal::seas(x, ...)) -
└─seasonal::seas(x, ...) -
└─seasonal::checkX13(fail = TRUE, fullcheck = FALSE, htmlcheck = FALSE)
[ FAIL 1 | WARN 8 | SKIP 17 | PASS 644 ] Error: Test failures Execution halted
R CMD check generated the following check_fail:
- rcmdcheck_tests_pass
Test coverage with covr
ERROR: Test Coverage Failed
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
| function | cyclocomplexity |
|---|---|
| ts_span | 34 |
| time_shift | 21 |
| ts_plot | 21 |
| copy_class | 19 |
Static code analyses with lintr
lintr found the following 178 potential issues:
| message | number of times |
|---|---|
| Lines should not be more than 80 characters. | 178 |
:heavy_check_mark: Package has at least one HTML vignette
Package Versions
| package | version |
|---|---|
| pkgstats | 0.0.0.311 |
| pkgcheck | 0.0.1.486 |
| srr | 0.0.1.107 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
R CMD check
There seems to be an issue with the package seasonal on your system. The package is special since it uses x13binary which downloads pre-built binaries from CRAN. Usually this works fine and on most CRAN platforms.
Is there anything special about your system? There was an issue for Mac M1.
If you want to check, this should works:
library(seasonal)
seas(AirPassengers)
checkX13()
But I don't want you to drag into this. I can also remove the few tests that are using seasonal, which should then resolve the issue?
Coverage
Perhaps this is just because of the failing R CMD check? If not, is there a way to see what ist the measured coverage? I had 81.2% on my system, which passed the threshold (see below).
Report on my side
If I run pkgcheck::pkgcheck(".") on my side, I get:
#### Coverage
✔ Package uses 'roxygen2'.
✔ Package has a 'contributing.md' file.
✔ Package has a 'CITATION' file.
✔ Package has a 'codemeta.json' file.
✔ All functions have examples.
✔ Package has at least one HTML vignette
✔ Package 'DESCRIPTION' has a URL field.
✔ Package 'DESCRIPTION' has a BugReports field.
✔ Package is already on CRAN.
✔ Package has continuous integration checks.
✔ Package coverage is 81.2%.
✔ R CMD check found no errors.
✔ R CMD check found no warnings.
✔ All applicable standards [v0.0.1] have been documented in this package.
ℹ Current status:
✔ This package may be submitted.
@ropensci-review-bot assign @rkillick as editor
Assigned! @rkillick is now the editor
The goodpractice() checks came back with:
- Use <- instead of = ts_frequency.R L120
- Lines longer than 80 characters: ts_bind.R L58, L62 ts_first_of_period.R L12, L13, L18 test-ts_frequency.R L33
I had problems running covr() but see that it is passing on the checks above. Please fix the above and I'll look to assign reviewers as these are small changes.
Reviewer: @chamberlinc Due date: 2021-11-18
Reviewer: @brunaw Due date: 2021-11-22
@ropensci-review-bot add @chamberlinc to reviewers
I'm sorry @rkillick, I'm afraid I can't do that. That's something only editors are allowed to do.
@rkillick I've now added you to the editors team, sorry about this! You can now repeat the comment and it should work. Thanks for your patience!
@ropensci-review-bot add @chamberlinc to reviewers
@chamberlinc added to the reviewers list. Review due date is 2021-11-23. Thanks @chamberlinc for accepting to review! Please refer to our reviewer guide.
@chamberlinc: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot add @brunaw to reviewers
@brunaw added to the reviewers list. Review due date is 2021-11-23. Thanks @brunaw for accepting to review! Please refer to our reviewer guide.
@brunaw: If you haven't done so, please fill this form for us to update our reviewers records.
@rkillick @christophsax , thank you for the opportunity to review, and my apologies that this has taken so long. Below is my review with some comments. I also have uploaded the knitted html notebook that details the tests I ran at https://github.com/chamberlinc/tsbox-review. The code I used to test the package functions is also available in this repo.
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- [X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- [X] Installation instructions: for the development version of package and any non-standard dependencies in README
- [X] Vignette(s): demonstrating major functionality that runs successfully locally
- [X] Function Documentation: for all exported functions
- [X] Examples: (that run successfully locally) for all exported functions
- [X] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL,BugReportsandMaintainer(which may be autogenerated viaAuthors@R).
Functionality
- [X] Installation: Installation succeeds as documented.
- [ ] Functionality: Any functional claims of the software been confirmed.
Errors encountered in ts_plot(), ts_xts() and ts_long(). I am including the structure of the data frame I was working with in the code below:
Errors with ts_xts() and ts_long():
> Conn_discharge_DO <- dataRetrieval::readNWISuv(
+ siteNumbers = "01193050",
+ parameterCd = c("00060", "00300"),
+ startDate = "2019-01-01",
+ endDate = "2021-01-01"
+ )
>
> str(Conn_discharge_DO)
'data.frame': 70519 obs. of 8 variables:
$ agency_cd : chr "USGS" "USGS" "USGS" "USGS" ...
$ site_no : chr "01193050" "01193050" "01193050" "01193050" ...
$ dateTime : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
$ X_00060_00000 : num 39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
$ X_00060_00000_cd: chr "A" "A" "A" "A" ...
$ X_00300_00000 : num 13.9 13.9 13.9 14 14 14 14 14 14 14 ...
$ X_00300_00000_cd: chr "A" "A" "A" "A" ...
$ tz_cd : chr "UTC" "UTC" "UTC" "UTC" ...
- attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=01193050&format=waterml,1.1&ParameterCd=00060,00300&startDT=2"| __truncated__
- attr(*, "siteInfo")='data.frame': 1 obs. of 13 variables:
..$ station_nm : chr "CONNECTICUT RIVER AT MIDDLE HADDAM, CT"
..$ site_no : chr "01193050"
..$ agency_cd : chr "USGS"
..$ timeZoneOffset : chr "-05:00"
..$ timeZoneAbbreviation: chr "EST"
..$ dec_lat_va : num 41.5
..$ dec_lon_va : num -72.6
..$ srs : chr "EPSG:4326"
..$ siteTypeCd : chr "ST-TS"
..$ hucCd : chr "01080205"
..$ stateCd : chr "09"
..$ countyCd : chr "09007"
..$ network : chr "NWIS"
- attr(*, "variableInfo")='data.frame': 2 obs. of 7 variables:
..$ variableCode : chr [1:2] "00060" "00300"
..$ variableName : chr [1:2] "Streamflow, ft³/s" "Dissolved oxygen, water, unfiltered, mg/L"
..$ variableDescription: chr [1:2] "Discharge, cubic feet per second" "Dissolved oxygen, water, unfiltered, milligrams per liter"
..$ valueType : chr [1:2] "Derived Value" "Derived Value"
..$ unit : chr [1:2] "ft3/s" "mg/l"
..$ options : chr [1:2] "" ""
..$ noDataValue : logi [1:2] NA NA
- attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
- attr(*, "statisticInfo")='data.frame': 1 obs. of 2 variables:
..$ statisticCd : chr "00000"
..$ statisticName: chr ""
- attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:53:57"
>
> # Try viewing both parameters together
> try(str(ts_xts(Conn_discharge_DO)))
more than one [value] column detected after [time] - using the outermost.
Are you using a wide data frame? To convert, use 'ts_long()'.
[time]: 'dateTime' [value]: 'X_00300_00000'
Error : cannot allocate vector of size 4.5 Gb
> try(str(ts_xts(ts_long(Conn_discharge_DO))))
[id] columns left of [time] column: 'agency_cd', 'site_no'
[time]: 'dateTime'
Error : 'value' column [value] is not numeric.
Errors with ts_plot():
> character_date_dat <- data.frame(
+ DateTime = c(
+ "2011-11-11 11:11:11", "2012-12-12 12:12:12", "2021-09-25 20:07:00"
+ ),
+ Value = c(1, 2, 8)
+ )
> try({ts_plot(character_date_dat)})
[time]: 'DateTime' [value]: 'Value'
Error in colnamesInt(x, neworder, check_dups = FALSE) :
argument specifying columns specify non existing column(s): cols[3]='Value'
> # I am downloading hydrological data timeseries from the USGS. This is my most frequent way of accessing timeseries data. Data comes as a data frame.
> Eno_discharge <- dataRetrieval::readNWISuv(
+ siteNumbers = "02085070",
+ parameterCd = "00060",
+ startDate = "2019-01-01",
+ endDate = "2021-01-01"
+ )
> # View the structure of the data frame
> str(Eno_discharge)
'data.frame': 70267 obs. of 6 variables:
$ agency_cd : chr "USGS" "USGS" "USGS" "USGS" ...
$ site_no : chr "02085070" "02085070" "02085070" "02085070" ...
$ dateTime : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
$ X_00060_00000 : num 424 421 424 428 428 428 428 428 435 435 ...
$ X_00060_00000_cd: chr "A" "A" "A" "A" ...
$ tz_cd : chr "UTC" "UTC" "UTC" "UTC" ...
- attr(*, "url")= chr "https://nwis.waterservices.usgs.gov/nwis/iv/?site=02085070&format=waterml,1.1&ParameterCd=00060&startDT=2019-01"| __truncated__
- attr(*, "siteInfo")='data.frame': 1 obs. of 13 variables:
..$ station_nm : chr "ENO RIVER NEAR DURHAM, NC"
..$ site_no : chr "02085070"
..$ agency_cd : chr "USGS"
..$ timeZoneOffset : chr "-05:00"
..$ timeZoneAbbreviation: chr "EST"
..$ dec_lat_va : num 36.1
..$ dec_lon_va : num -78.9
..$ srs : chr "EPSG:4326"
..$ siteTypeCd : chr "ST"
..$ hucCd : chr "03020201"
..$ stateCd : chr "37"
..$ countyCd : chr "37063"
..$ network : chr "NWIS"
- attr(*, "variableInfo")='data.frame': 1 obs. of 7 variables:
..$ variableCode : chr "00060"
..$ variableName : chr "Streamflow, ft³/s"
..$ variableDescription: chr "Discharge, cubic feet per second"
..$ valueType : chr "Derived Value"
..$ unit : chr "ft3/s"
..$ options : chr ""
..$ noDataValue : logi NA
- attr(*, "disclaimer")= chr "Provisional data are subject to revision. Go to http://waterdata.usgs.gov/nwis/help/?provisional for more information."
- attr(*, "statisticInfo")='data.frame': 1 obs. of 2 variables:
..$ statisticCd : chr "00000"
..$ statisticName: chr ""
- attr(*, "queryTime")= POSIXct[1:1], format: "2021-12-06 11:59:00"
> # Try viewing the discharge data using the ts_plot function
> try({ts_plot(Eno_discharge)})
[time]: 'dateTime' [value]: 'X_00060_00000'
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) :
Items of 'old' not found in column names: [X_00060_00000]. Consider skip_absent=TRUE.
> try({ts_plot(Eno_discharge, skip_absent=TRUE)})
Error in FUN(X[[i]], ...) : ts_boxable(x) is not TRUE
- [X] Performance: Any performance claims of the software been confirmed.
- [ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
> library(testthat)
> library(tsbox)
>
> test_check("tsbox")
Starting 2 test processes
== Skipped tests ===============================================================
* empty test (1)
== Warnings ====================================================================
-- Warning (test-ts_first_of_period.R:18:3): ts_first_of_period works ----------
no non-missing arguments to min; returning Inf
Backtrace:
1. tsbox::ts_first_of_period(x)
2. tsbox::ts_apply(x, dts_first_of_period)
3. tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
4. tsbox:::fun(x, ...)
8. base::NextMethod("[")
== Failed tests ================================================================
-- Error (test-ts_first_of_period.R:18:3): ts_first_of_period works ------------
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
x
1. \-tsbox::ts_first_of_period(x)
2. \-tsbox::ts_apply(x, dts_first_of_period)
3. \-tsbox:::ts_apply_dts(ts_dts(x), fun, ...)
4. \-tsbox:::fun(x, ...)
5. +-time[(max(which(time <= smry$start)):min(which(time >= smry$end)))]
6. +-base::`[.POSIXct`(...)
7. | \-base::.POSIXct(NextMethod("["), attr(x, "tzone"), oldClass(x))
8. \-base::NextMethod("[")
[ FAIL 1 | WARN 1 | SKIP 1 | PASS 770 ]
Error: Test failures
Execution halted
1 error x | 1 warning x | 0 notes √
- [X] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 7
- [X] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
First, I enjoyed the opportunity to review this code. This is the first code review I have done for rOpenSci, and I hope the comments below are helpful.
I like the general purpose of this package and agree there is probably a lot of use for a package that allows easy conversion between R object types. The documentation I think is all ok, though I think it could be improved with a little more description of the expected structure of data. I did not have any issues with installation. There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review). The code snippets in the three vignettes all worked on my machine.
I will preface my comments by saying I am not an experienced user of data.tables, so following the code was a little tricky for me. I usually work with timeseries as data frames, and I was having trouble following the code and figuring out which columns were being assigned as data, timestamps, or identifiers. This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered. When working with data frames that only had one timeseries, the functions all worked simply and, it seemed, as intended.
I was mostly experimenting with the ts_plot(), ts_long(), and ts_xts() functions. ts_xts() produced an error while working with the data frame that had multiple timeseries. Following the prompting, I tried using ts_long(), however this also produced an error that I did not understand, because it seemed to be accessing columns that I did not intend. I didn't see any way to input the correct column names to the function. Once I manually pivoted the data frame using pivot_longer() though, I was able to use ts_xts() and it performed everything correctly I believe. ts_plot() produced error messages that it might have mattered how the columns were organized, and that would have been helpful to know. I was not able to get this function to work for several formats of data frame data, even when following the suggested prompts in the error messages.
The code I used to access data and experiment is provided as well at the linked repo. I hope this is helpful!
@chamberlinc thank you for your valuable review of the package.
@brunaw are you able to provide your review soon? It was due on 23 Nov.
@chamberlinc Thank you so much for your review! I will look into it in more detail, but the error messages/feedback in non-standard situations certainly seems worth to be addressed.
First of all, please apologize for my late answer. I originally planned to wait for the second review and lost track afterward.
@chamberlinc, thanks again for your very helpful review! I dealt with your points in PR #211.
Most of your problems arose when applying the functions to non-standard data frames, especially wide ones. In some cases, the messages already pointed to the use use of ts_long(), but the error messages were not too helpful. In PR #211, I tried to address some of these problems. The error messages are improved, and ts_long() can handle some more complex cases. It works now with a toy example of your data set.
A bug in ts_plot() also caused a weird error message. This bug is now resolved as well.
To sum up, I addressed the following three points from your review:
description of the expected data structure
I think it could be improved with a little more description of the expected structure of data.
Extended paragraph on data frames: https://www.tsbox.help/articles/tsbox.html#time-series-in-data-frames-1
Failing test
There was one test that failed on my machine (see details in the index.nb.html file in the linked repo, https://github.com/chamberlinc/tsbox-review).
The error and the warning did not occur on my system, but the computation in ts_first_of_period() did not work as expected. I added a new test and a fix. Both are green now on my systems (local and GHA tests in the repository).
Error Messages on wide data frames
This caused a few issues when I had data frames of multiple concurrent timeseries and was getting error messages indicating that the ordering of the columns somehow may have mattered.
I improved some of the error messages when functions were applied to wide data structures.
library(nycflights13)
suppressPackageStartupMessages(library(dplyr))
library(tsbox)
packageVersion("tsbox")
#> [1] '0.3.1.9002'
d3 <- weather |>
select(origin, time_hour, temp, humid, precip)
d3
#> # A tibble: 26,115 × 5
#> origin time_hour temp humid precip
#> <chr> <dttm> <dbl> <dbl> <dbl>
#> 1 EWR 2013-01-01 01:00:00 39.0 59.4 0
#> 2 EWR 2013-01-01 02:00:00 39.0 61.6 0
#> 3 EWR 2013-01-01 03:00:00 39.0 64.4 0
#> 4 EWR 2013-01-01 04:00:00 39.9 62.2 0
#> 5 EWR 2013-01-01 05:00:00 39.0 64.4 0
#> 6 EWR 2013-01-01 06:00:00 37.9 67.2 0
#> 7 EWR 2013-01-01 07:00:00 39.0 64.4 0
#> 8 EWR 2013-01-01 08:00:00 39.9 62.2 0
#> 9 EWR 2013-01-01 09:00:00 39.9 62.2 0
#> 10 EWR 2013-01-01 10:00:00 41 59.6 0
#> # … with 26,105 more rows
This is a wide data frame, so tsbox errors with a meaningful message:
ts_ts(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern
This also works if the order is different
d4 <- weather |>
select(origin, temp, humid, precip, time_hour)
d4
#> # A tibble: 26,115 × 5
#> origin temp humid precip time_hour
#> <chr> <dbl> <dbl> <dbl> <dttm>
#> 1 EWR 39.0 59.4 0 2013-01-01 01:00:00
#> 2 EWR 39.0 61.6 0 2013-01-01 02:00:00
#> 3 EWR 39.0 64.4 0 2013-01-01 03:00:00
#> 4 EWR 39.9 62.2 0 2013-01-01 04:00:00
#> 5 EWR 39.0 64.4 0 2013-01-01 05:00:00
#> 6 EWR 37.9 67.2 0 2013-01-01 06:00:00
#> 7 EWR 39.0 64.4 0 2013-01-01 07:00:00
#> 8 EWR 39.9 62.2 0 2013-01-01 08:00:00
#> 9 EWR 39.9 62.2 0 2013-01-01 09:00:00
#> 10 EWR 41 59.6 0 2013-01-01 10:00:00
#> # … with 26,105 more rows
ts_ts(d4)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'time_hour' [value]: 'precip'
#> Error: series has no regular pattern
Using ts_plot() on these malformed data frames has returned the following nonsensical error:
[time]: 'time_hour' [value]: 'precip'
Error in setnames(x, c(cid, ctime, cvalue), c("id", "time", "value")) :
Items of 'old' not found in column names: [precip]. Consider skip_absent=TRUE.
>
The skip_absent=TRUE is a data.table suggestion, not meant for use in tsbox. I fixed this in commit 50e694b8. The message now makes more sense:
ts_plot(d3)
#> Found numeric [id] column(s): 'temp', 'humid'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#>
#> [time]: 'time_hour' [value]: 'precip'
#> too many series. Only showing the first 20.
reflecting the fact that we have a large number of absurdly short time series (since value columns were classified as id columns)
I also tried to improve a bit on your examples. I construct a data frame that looks similar to yours:
# > str(Conn_discharge_DO)
# 'data.frame': 70519 obs. of 8 variables:
# $ agency_cd : chr "USGS" "USGS" "USGS" "USGS" ...
# $ site_no : chr "01193050" "01193050" "01193050" "01193050" ...
# $ dateTime : POSIXct, format: "2019-01-01 05:00:00" "2019-01-01 05:15:00" "2019-01-01 05:30:00" "2019-01-01 05:45:00" ...
# $ X_00060_00000 : num 39200 41300 40400 40400 41100 43000 41800 42300 41700 42400 ...
# $ X_00060_00000_cd: chr "A" "A" "A" "A" ...
Conn_discharge_DO <- tibble(
agency_cd = c("USGS", "USGS", "USGS", "USGS"),
site_no = c("01193050", "01193050", "01193050", "01193050"),
dateTime = as.POSIXct(c("2019-01-01 05:00:00", "2019-01-01 05:15:00", "2019-01-01 05:30:00", "2019-01-01 05:45:00")),
unit = "ft3/s",
X_00060_00000 = 1:4,
noDataValue = NA,
X_00060_000002 = 11:14,
X_00060_00000_cd = "A"
)
tsbox now detects X_00060_00000 as an id column. (As it says in the documentation, it starts on the right). And it suggests using ts_long().
ts_xts(Conn_discharge_DO)
#> Found numeric [id] column(s): 'X_00060_00000'.
#> Are you using a wide data frame? To convert, use 'ts_long()'.
#> Convert column(s) to character or factor to silence this message.
#> [time]: 'dateTime' [value]: 'X_00060_000002'
#> Loading required namespace: xts
#> USGS_01193050_ft3/s_1_NA_A USGS_01193050_ft3/s_2_NA_A
#> 2019-01-01 05:00:00 11 NA
#> 2019-01-01 05:15:00 NA 12
#> 2019-01-01 05:30:00 NA NA
#> 2019-01-01 05:45:00 NA NA
#> USGS_01193050_ft3/s_3_NA_A USGS_01193050_ft3/s_4_NA_A
#> 2019-01-01 05:00:00 NA NA
#> 2019-01-01 05:15:00 NA NA
#> 2019-01-01 05:30:00 13 NA
#> 2019-01-01 05:45:00 NA 14
ts_long() works now on this. It classifies character and factor columns right of the time column as id columns (with a message).
ts_long(Conn_discharge_DO)
#> found columns right to the [time] column that will be treated as [id] columns (character or factor): 'unit', 'X_00060_00000_cd'.
#> Additional [id] column(s): 'agency_cd', 'site_no', 'unit', 'X_00060_00000_cd'
#> [time]: 'dateTime'
#> # A tibble: 12 × 7
#> agency_cd site_no unit X_00060_00000_cd id dateTime value
#> <chr> <chr> <chr> <chr> <chr> <dttm> <int>
#> 1 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:00:00 1
#> 2 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:15:00 2
#> 3 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:30:00 3
#> 4 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:45:00 4
#> 5 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:00:00 NA
#> 6 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:15:00 NA
#> 7 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:30:00 NA
#> 8 USGS 01193050 ft3/s A noDataVa… 2019-01-01 05:45:00 NA
#> 9 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:00:00 11
#> 10 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:15:00 12
#> 11 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:30:00 13
#> 12 USGS 01193050 ft3/s A X_00060_… 2019-01-01 05:45:00 14
Hello Christoph, Apologies for the delay again. These updates I think are very good. The functions worked as expected this time with the data I typically work with. I had a few automated tests fail this time though, one because I have an older version of R that does not support the |> operator, and the rest all seemingly because of some issue accessing the function ts_dts.
The details of the automated test output you can find here: https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md. The rest of my review is below:
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- [X ] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
-
[ X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
-
[X ] Installation instructions: for the development version of package and any non-standard dependencies in README
-
[X ] Vignette(s): demonstrating major functionality that runs successfully locally
Caveat: I am running R 4.0.5 on my machine, and so I do not have native pipes. For the tsbox vignette, the code did work, but I first had to replace all instances |> with %>%. I’d recommend using the older syntax to be compatable with more versions of R.
- [X ] Function Documentation: for all exported functions
I did not check all of the functions’ documentation, but the subset I checked looked good.
- [X ] Examples: (that run successfully locally) for all exported functions
Again, only checked a subset, but all worked.
- [X \ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL,BugReportsandMaintainer(which may be autogenerated viaAuthors@R).
Functionality
- [X ] Installation: Installation succeeds as documented.
- [X ] Functionality: Any functional claims of the software been confirmed.
- [X ] Performance: Any performance claims of the software been confirmed.
- [ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. Several checks and tests still seem to fail, at least one of which is likely due to native pipes with my version of R. The other has to do with the ts_dts function.
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_4.0.5 magrittr_2.0.3 fastmap_1.1.0 cli_3.3.0
## [5] tools_4.0.5 htmltools_0.5.2 rstudioapi_0.13 yaml_2.3.5
## [9] stringi_1.7.6 rmarkdown_2.14 knitr_1.39 stringr_1.4.0
## [13] xfun_0.31 digest_0.6.29 rlang_1.0.2 evaluate_0.15
pkg_dir <- "C:/Users/Cathy/Documents/PeerReviewing/tsbox/R2/tsbox/"
try(devtools::check(pkg_dir))
## i Updating tsbox documentation
## i Loading tsbox
## Error in parse(text = lines, keep.source = TRUE, srcfile = srcfilecopy(srcref_path %||% :
## C:\Users\Cathy\Documents\PeerReviewing\tsbox\R2\tsbox/tests/testthat/test-date_utils.R:9:41: unexpected '>'
## 8:
## 9: x1 <- ts_tbl(ts_c(mdeaths, fdeaths)) |>
## ^
devtools::test(pkg_dir)
## Output removed for clarity. View at https://github.com/chamberlinc/tsbox-review/blob/master/tsbox_R2.md ##
- [X ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 3
- [X ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.
Review Comments
The functions now work as expected for me for the types of data I typically use. Thanks for the updates! I found the messages on the ts_long function to be very helpful.
Very sorry for the slow turnaround.
Native Pipe
I changed the native pipe (|>) back to the magrittr one (%>%), so this should now work on your system with the latest version (0.3.1.9003).
I introduced the native pipe deliberately in the vignettes since I always wanted to keep the dependency graph very simple (tsbox only imports data.table and anytime) and wanted to make the vignettes available for non-magrittr/dplyr users. But I see that there is a trade-off between this aim and the support for older versions of R. The native pipe was introduced in R 4.1 on 2021-05-18 and is now available for almost 15 months. With more time passing, only a few older versions will be around. I changed it back for now but will go to the native pipe again at some point in the future.
Other Test Errors
I have a hard time reproducing them. First, all the tests run through on my system (macOS, R release) and all systems on GitHub Actions (ubuntu-20.04, release, devel), (windows-latest, release), (macOS-latest, release). So I wonder if this is another problem related to the old R version?
Your error says:
Error in `UseMethod("ts_dts")`: no applicable method for 'ts_dts' applied to an object of class "data.frame"
But there clearly is such a method:
https://github.com/christophsax/tsbox/blob/dd0447c66405da6101419c31b0fe3f0cffdfadc1/R/to_from_data.frame.R#L14-L18
If you still encounter the error in the tests, could you try to run one the failing example outside of the tests? Does this fail? It really shouldn't.
library(tsbox)
with_id <- wo_id <- ts_df(mdeaths)
with_id$id <- "mdeaths"
ts_c(wo_id, with_id)
Thank you so much for your work! I guess we are alomost done now. I will answer quickly this time :-)
Hello, and sorry for my slow turn around this time.
I think that plan for the pipes makes the most sense. Unfortunately there are a significant number of users who don't have control over updating their own versions of R because of the IT protocols of their companies or agencies. But, given enough time I agree it will be less of a problem and decreasing dependencies is a good goal.
The error that was occurring previously has resolved itself on my computer. However, I am now getting a new error with the test test-ts_first_of_period:
Warning (test-ts_first_of_period.R:19:3): ts_first_of_period works
no non-missing arguments to min; returning Inf
Backtrace:
1. tsbox::ts_first_of_period(x)
at test-ts_first_of_period.R:19:2
2. tsbox::ts_apply(x, dts_first_of_period)
at tsbox/R/ts_first_of_period.R:20:2
3. tsbox::ts_apply_dts(ts_dts(x), fun, ...)
at tsbox/R/ts_apply.R:44:2
4. tsbox fun(x, ...)
at tsbox/R/ts_apply.R:18:4
8. base::NextMethod("[")
Error (test-ts_first_of_period.R:19:3): ts_first_of_period works
Error in `max(which(time <= smry$start)):min(which(time >= smry$end))`: result would be too long a vector
Backtrace:
1. tsbox::ts_first_of_period(x)
at test-ts_first_of_period.R:19:2
2. tsbox::ts_apply(x, dts_first_of_period)
at tsbox/R/ts_first_of_period.R:20:2
3. tsbox::ts_apply_dts(ts_dts(x), fun, ...)
at tsbox/R/ts_apply.R:44:2
4. tsbox fun(x, ...)
at tsbox/R/ts_apply.R:18:4
8. base::NextMethod("[")
I tried stepping through the test code outside of the test function, and I got the same error. Going into the dts_first_of_period function, the error seems to be because which(time >= smry$end)) produces integer(0).
Other than that though, everything worked well.