software-review icon indicating copy to clipboard operation
software-review copied to clipboard

Presubmission inquiry: GHCNr

Open emilio-berti opened this issue 7 months ago • 7 comments

Submitting Author Name: Emilio Berti Submitting Author Github Handle: @emilio-berti Repository: https://github.com/emilio-berti/GHCNr Submission type: Pre-submission Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: GHCNr
Title: Download Weather Station Data from GHCNd
Version: 1.4.5
Authors@R: 
    person("Emilio", "Berti", , "[email protected]", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-9286-011X"))
Description: The goal of 'GHCNr' is to provide a fast and friendly interface with the Global Historical Climatology Network daily (GHCNd) database, which contains daily summaries of weather station data worldwide (<https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily>). GHCNd is accessed through the web API <https://www.ncei.noaa.gov/access/services/data/v1>. 'GHCNr' main functionalities consist of downloading data from GHCNd, filter it, and to aggregate it at monthly and annual scales.
License: MIT + file LICENSE
Imports:
    tibble,
    dplyr,
    tidyr,
    readr,
    tidyselect,
    httr2,
    terra,
    utils,
    rlang,
    curl
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Depends: 
    R (>= 2.10)
LazyData: true
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check one or more appropriate boxes below):

    Data Lifecycle Packages

    • [x] data retrieval
    • [ ] data extraction
    • [x] data munging
    • [ ] data deposition
    • [x] data validation and testing
    • [ ] workflow automation
    • [ ] version control
    • [ ] citation management and bibliometrics
    • [ ] scientific software wrappers
    • [ ] field and lab reproducibility tools
    • [ ] database software bindings
    • [x] geospatial data
    • [ ] text analysis

    Statistical Packages

    • [ ] Bayesian and Monte Carlo Routines
    • [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
    • [ ] Machine Learning
    • [ ] Regression and Supervised Learning
    • [ ] Exploratory Data Analysis (EDA) and Summary Statistics
    • [ ] Spatial Analyses
    • [ ] Time Series Analyses
    • [ ] Probability Distributions
  • GHCNr retrieves weather time series data from the Global Historical Climatology Network daily (GHCNd) database, a repository of field station data. In addition to retrieval, the time series is cleaned removing flagged records and processed to obtain monthly, annual, and normal statistics.

  • The target audience are ecologists wanting to access the GHCNd database through R. The scientific applications of this package are the analysis of ecological phenomena for which on-site daily time series are required.

  • The FedData package (https://github.com/ropensci/FedData) also retrieves GHCNd data, but it does not perform record flagging and automatic cleaning. GHCNr achieve this. Additionally, GHCNr improves the reproducibility and usability by reducing the additional data processing needed to work with the output from FedData. GHCNr contains the following additional functionalities: 0) Calculating temporal coverage of the time series. 1) Aggregating daily time series to monthly, annual, and normal time series. 2) Calculate temperature anomalies using a reference period. 3) Plotting functions.

  • GHCNr is deposited on CRAN: https://cran.r-project.org/package=GHCNr.

emilio-berti avatar Apr 01 '25 12:04 emilio-berti

@ropensci-review-bot check package

maurolepore avatar Apr 02 '25 21:04 maurolepore

Thanks, about to send the query.

ropensci-review-bot avatar Apr 02 '25 21:04 ropensci-review-bot

:rocket:

The following problems were found in your submission template:

  • submission type must be one of [Standard, Estandar, Stats]
  • HTML variable [editor] is missing
  • HTML variable [reviewers-list] is missing
  • HTML variable [due-dates-list] is missing Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

:wave:

ropensci-review-bot avatar Apr 02 '25 21:04 ropensci-review-bot

Checks for GHCNr (v1.4.5)

git hash: 01ec4426

  • :heavy_check_mark: Package is already on CRAN.
  • :heavy_multiplication_x: does not have a 'codemeta.json' file.
  • :heavy_multiplication_x: does not have a 'contributing' file.
  • :heavy_check_mark: uses 'roxygen2'.
  • :heavy_multiplication_x: 'DESCRIPTION' does not have a URL field.
  • :heavy_multiplication_x: 'DESCRIPTION' does not have a BugReports field.
  • :heavy_check_mark: Package has at least one HTML vignette
  • :heavy_multiplication_x: These functions do not have examples: [.add_variables, .api_error, .check_flags, .daily_request, .daily_url, .drop_flags, .elevation_url, .extract_flag, .flags, .inventory_url, .max, .mean, .min, .missing_variables, .s3_annual, .s3_anomaly, .s3_daily, .s3_monthly, .s3_quarterly, .sum].
  • :heavy_multiplication_x: Continuous integration checks unavailable (no URL in 'DESCRIPTION').
  • :heavy_multiplication_x: Package coverage is 35.5% (should be at least 75%).
  • :heavy_multiplication_x: Default GitHub branch of 'master' is not acceptable.
  • :heavy_check_mark: R CMD check found no errors.
  • :heavy_check_mark: R CMD check found no warnings.
  • :eyes: Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with :eyes: may be optionally addressed.)

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 88
internal GHCNr 19
internal stats 9
internal grDevices 4
internal graphics 1
imports dplyr 87
imports tidyselect 24
imports tidyr 7
imports tibble 5
imports utils 5
imports curl 4
imports readr 3
imports httr2 1
imports terra 1
imports rlang NA
suggests knitr NA
suggests rmarkdown NA
suggests testthat NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

length (8), c (6), colnames (6), for (6), intersect (5), max (5), min (5), args (4), ifelse (4), list (4), tryCatch (4), unique (4), by (3), seq.Date (3), sum (3), body (2), errorCondition (2), lapply (2), mean (2), message (2), url (2), matrix (1), paste (1), paste0 (1), rowSums (1), seq_along (1), setdiff (1)

dplyr

mutate (20), select (16), group_by (9), summarize (8), filter (7), rename_with (5), across (3), distinct_all (3), full_join (3), left_join (3), bind_rows (2), pull (2), relocate (2), arrange (1), bind_cols (1), case_when (1), group_split (1)

tidyselect

contains (12), any_of (6), all_of (3), matches (3)

GHCNr

as_daily (2), coverage (2), get_country (2), stations (2), annual (1), annual_coverage (1), anomaly (1), daily (1), download_inventory (1), elevation_stations (1), filter_stations (1), get_countries (1), monthly (1), monthly_coverage (1), period_coverage (1)

stats

interaction.plot (9)

tidyr

pivot_longer (4), pivot_wider (3)

tibble

tibble (4), as_tibble (1)

utils

data (5)

curl

has_internet (4)

grDevices

palette (4)

readr

fwf_positions (1), read_fwf (1), read_table (1)

graphics

par (1)

httr2

resp_status (1)

terra

vect (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 15 files) and
  • 1 authors
  • 1 vignette
  • 3 internal data files
  • 10 imported packages
  • 42 exported functions (median 11 lines of code)
  • 43 non-exported functions in R (median 19 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 15 71.3
files_vignettes 1 61.7
files_tests 8 83.6
loc_R 863 61.4
loc_vignettes 157 38.8
loc_tests 130 41.6
num_vignettes 1 58.7
data_size_total 63702 79.9
data_size_median 26506 84.7
n_fns_r 85 70.7
n_fns_r_exported 42 84.4
n_fns_r_not_exported 43 61.8
n_fns_per_file_r 3 53.2
num_params_per_fn 1 1.8 TRUE
loc_per_fn_r 13 39.9
loc_per_fn_r_exp 11 25.4
loc_per_fn_r_not_exp 19 60.6
rel_whitespace_R 12 51.3
rel_whitespace_vignettes 18 19.7
rel_whitespace_tests 8 21.8
doclines_per_fn_exp 15 6.6
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 68 71.3

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)


3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fails:

  1. description_url
  2. description_bugreports

Test coverage with covr

Package coverage: 35.53

The following files are not completely covered by tests:

file coverage
R/anomaly.R 0%
R/daily.R 22.09%
R/elevation.R 0%
R/get-country-shapefile.R 0%
R/plotting.R 0%
R/quarterly.R 0%
R/s3_classes.R 61.54%
R/stations.R 2.44%
R/utils.R 61.76%

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 32 potential issues:

message number of times
Avoid 1:nrow(...) expressions, use seq_len. 1
Avoid using sapply, consider vapply instead, that's type safe 2
Lines should not be more than 80 characters. This line is 100 characters. 2
Lines should not be more than 80 characters. This line is 101 characters. 1
Lines should not be more than 80 characters. This line is 102 characters. 1
Lines should not be more than 80 characters. This line is 105 characters. 1
Lines should not be more than 80 characters. This line is 106 characters. 3
Lines should not be more than 80 characters. This line is 108 characters. 1
Lines should not be more than 80 characters. This line is 111 characters. 1
Lines should not be more than 80 characters. This line is 113 characters. 1
Lines should not be more than 80 characters. This line is 82 characters. 2
Lines should not be more than 80 characters. This line is 83 characters. 3
Lines should not be more than 80 characters. This line is 85 characters. 2
Lines should not be more than 80 characters. This line is 86 characters. 1
Lines should not be more than 80 characters. This line is 90 characters. 1
Lines should not be more than 80 characters. This line is 92 characters. 1
Lines should not be more than 80 characters. This line is 93 characters. 1
Lines should not be more than 80 characters. This line is 95 characters. 1
Lines should not be more than 80 characters. This line is 96 characters. 2
Lines should not be more than 80 characters. This line is 98 characters. 1
Lines should not be more than 80 characters. This line is 99 characters. 2
Missing chunk end for chunk (maybe starting at line 64). 1

4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 8 function names are duplicated in other packages:

    • .mean from treeclim
    • .sum from treeclim
    • anomaly from satin
    • coverage from actuar, BAT, binomSamSize, clustAnalytics, Cyclops, DiceDesign, indicspecies, LPCM, mldr, mosaicCore, mosaicModel, peptider, Rgbp, ritis, SAEval, simMetric
    • daily from almanac
    • get_countries from imdbapi, povcalnetR
    • monthly from almanac
    • stations from altfuelr

Package Versions

package version
pkgstats 0.2.0.54
pkgcheck 0.1.2.123

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

ropensci-review-bot avatar Apr 02 '25 21:04 ropensci-review-bot

@emilio-berti thanks so much for sharing your work with rOpenSci and congratulations for publishing it on CRAN.

At first glance the package seems to be in scope but the final decision would go through the editorial board and before I pass that along I need to better understand the case for this package with special focus on its overlap with {FedData}.

Also it would be nice to try address as many of the issues detected by {pkgcheck}:

{pkgcheck} shows multiple problems (https://github.com/ropensci/software-review/issues/696#issuecomment-2773774470). Please expand the drop-downs to see details.

✔️ Package is already on CRAN.
✖️ does not have a 'codemeta.json' file.
✖️ does not have a 'contributing' file.
✔️ uses 'roxygen2'.
✖️ 'DESCRIPTION' does not have a URL field.
✖️ 'DESCRIPTION' does not have a BugReports field.
✔️ Package has at least one HTML vignette
✖️ These functions do not have examples: [.add_variables, .api_error, .check_flags, .daily_request, .daily_url, .drop_flags, .elevation_url, .extract_flag, .flags, .inventory_url, .max, .mean, .min, .missing_variables, .s3_annual, .s3_anomaly, .s3_daily, .s3_monthly, .s3_quarterly, .sum].
✖️ Continuous integration checks unavailable (no URL in 'DESCRIPTION').
✖️ Package coverage is 35.5% (should be at least 75%).
✖️ Default GitHub branch of 'master' is not acceptable.
✔️ R CMD check found no errors.
✔️ R CMD check found no warnings.
👀 Function names are duplicated in other packages
Important: All failing checks above must be addressed prior to proceeding

While I'm here, I'll share a few preliminary checks. Whatever our decision I hope some of this helps as feedback.

Preliminary checks

- [ ] **Documentation**: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,

    - [ ] Is the case for the package well made?
    > On the package repo and its vignette I see examples but not a case for the package. 
    > It seems important to compare it with the other related projects, e.g. the FedData package (https://github.com/ropensci/FedData) .

    - [ ] Is the reference index page clear (grouped by topic if necessary)?
    > There is no pkgdown website. 
    > The manual shows an API of considerable size so eventually the Reference will benefit from such grouping (https://cran.r-project.org/web/packages/GHCNr/GHCNr.pdf).

    - [ ] Are vignettes readable, sufficiently detailed and not just perfunctory?
    > The vignette and repo-home seems to show the same. 
    > README.md is not at the root but under .github/

- [ ] **Fit**: The package meets criteria for [fit](https://devguide.ropensci.org/policies.html#package-categories) and [overlap](https://devguide.ropensci.org/policies.html#overlap).
> It would be nice to see in README an explanation of the overlap with {FedData}

- [ ] **Installation instructions:** Are installation instructions clear enough for human users?
 > I see no installation instruction.

- [ ] **Tests**: If the package has some interactivity / HTTP / plot production etc. are the tests using [state-of-the-art tooling](https://devguide.ropensci.org/building.html#testing)?
> It would be nice to map each R/f.R to a corresponding tests/testthat/test-f.R

- [ ] **Contributing information**: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
> I see no CONTRIBUTING.md or equivalent instructions.

- [x] **License:** The package has a CRAN or OSI accepted license.

- [x] **Project management**: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?


Comments

In the roxygen2 documentation we recommend markdown syntax

# Now
#' \emph{station_id} can be a vector with multiple stations.

# Equivalent
#' *station_id* can be a vector with multiple stations.

# Recommended: If it's a variable then consider `code` syntax. See the tidyverse style guide.
#' `station_id` can be a vector with multiple stations.

See CRAN NOTEs at https://cran.r-project.org/web/checks/check_results_GHCNr.html

Check Details
Version: 1.4.5
Check: DESCRIPTION meta-information
Result: NOTE 
    Missing dependency on R >= 4.1.0 because package code uses the pipe
    |> or function shorthand \(...) syntax added in R 4.1.0.
    File(s) using such syntax:
      ‘annual.R’ ‘anomaly.R’ ‘coverage.R’ ‘daily.R’ ‘flags.R’
      ‘get-country-shapefile.R’ ‘monthly.R’ ‘quarterly.R’ ‘s3_classes.R’
      ‘stations.R’ ‘utils.R’

maurolepore avatar Apr 02 '25 21:04 maurolepore

This marks the end of my EiC submission. Here's a short summary for the next EiC:

  • This is a pre-submission and I left some pre-checks here.
  • The bot's checks show issues that need be be addressed before we're ready for a full submission.

@emilio-berti thanks for sharing your work with rOpenSci!

maurolepore avatar May 04 '25 22:05 maurolepore

@emilio-berti Any updates on addressing the issues flagged above?

mpadge avatar May 23 '25 10:05 mpadge

@emilio-berti I'm going to close this issue now due to lack of response. If you are interested in proceeding, feel free to re-open at any stage to continue the conversion. Thank you

mpadge avatar Jun 23 '25 08:06 mpadge