rio icon indicating copy to clipboard operation
rio copied to clipboard

Standard for import methods

Open chainsawriot opened this issue 7 years ago • 6 comments

I think it will be a good idea to specify a standard for import methods in the CONTRIBUTING.md. For standard I mean argument names such as path, which and header because there are two "standards" for these arguments:

  1. base functions use file, which and header
  2. readxl uses path, sheet and col_names

By reading the code, rio is using file, which and header but I think it will be great to be explicit. As far as I know, some packages (such as googlesheets and readODS) are trying to emulate the interface of readxl.

chainsawriot avatar Oct 13 '16 16:10 chainsawriot

Good points. Need to consider how to standardize this.

leeper avatar Oct 14 '16 13:10 leeper

I've been thinking about this. The first step is really a cross-walk to show how attribute names vary across file formats. Something like:

Attribute Stata SPSS SAS
Variable name ... ... ...
Variable description ... ... ...
Variable class/type ... ... ...
Value labels ... ... ...

leeper avatar Mar 01 '17 17:03 leeper

I've been thinking about this. The first step is really a cross-walk to show how attribute names vary across file formats. Something like: Attribute Stata SPSS SAS Variable name ... ... ... Variable description ... ... ... Variable class/type ... ... ... Value labels ... ... ...

Could the crosswalk spreadsheet I posted in #228 be germane to this?

bokov avatar Oct 14 '19 11:10 bokov

Proposed standard for import methods:

When writing new methods, where practical, the following format is recommended:

.import.rio_SUFFIX <- function(file, ARG1=VAL1, ARG2=VAL2, ...) {
    requireNamespace("PARENT_PACKAGE")
    arg_reconcile(PARENT_PACKAGE::READER_FUNCTION, FILEARG = file, ARG1=VAL1, ARG2=VAL2,..., .docall=TRUE)
}

In the above template, SUFFIX is replaced by the file extension for that format (.xlsx, .csv, .dta, etc.). You can optionally define any number of additional default arguments the reader function recognizes (represented by ARG1, ARG2, etc.) and their respective default values (VAL1, VAL2, etc.). The ellipsis ... is required in your function definition and should be passed to arg_reconcile to avoid errors when the same code is used to import files with different formats. The .docall=TRUE should always be used unless you intend to capture the normalized argument list and further process it before passing it to the reader function. PARENT_PACKAGE is the package that provides the reader function (e.g. for read_xlsx it would be "readxl" and so PARENT_PACKAGE::READER_FUNCTION would be readxl::read_xls). FILEARG is the name of the argument that the reader function uses to represent the file (e.g. for read_xlsx it's path so the argument mapping would begin with path = file). For more details, please see ? rio:::arg_reconcile

bokov avatar Mar 08 '21 22:03 bokov

@leeper , should I submit a PR on CONTRIBUTING.md?

bokov avatar Mar 08 '21 22:03 bokov

@bokov Please do!

leeper avatar Mar 19 '21 12:03 leeper