usethis
usethis copied to clipboard
Function to create initial data docs
Hi! document data is always tricky, so having a function to help people document the dataset would be fantastic. This function could create an .R file in the R folder containing information gathered from the data set. For example, the information can be inserted in a roxygen template, and the @items values can be filled using glue(). The template could also include some descriptive information to help users understand the dataset better.
Would you like to make a PR for consideration? Full disclosure: I'm not 100% convinced that usethis should do this. But this is the topic of an issue I recently closed in R Packages, which contains some concrete ideas to start with.
https://github.com/hadley/r-pkgs/issues/707
As I continue to work on R Packages, I've learned there are packages that already offer functions to do this. One example is sinew (https://cran.r-project.org/web/packages/sinew/index.html). So given that there are solutions out there already, I don't think it's a priority for us to add this to usethis.
Just wanted to add to this since I had the same thought. sinew does not fit well into the package development ecosystem for usethis users; it prints a string rather than creating a file, doesn't use Markdown syntax, doesn't check for existing Roxygen etc. usethis offers use_package_doc(), which is a major help, and I think the same could be done by a function, e.g., use_data_doc(), which takes in a dataset and creates a file with a Roxygen skeleton. It could also be an argument to use_data(), e.g., use_data(., doc = TRUE) which saves the dataset to /data and creates a documentation file. Thank you for considering and making such a useful package!
Just wanted to add to this since I had the same thought.
sinewdoes not fit well into the package development ecosystem forusethisusers; it prints a string rather than creating a file, doesn't use Markdown syntax, doesn't check for existing Roxygen etc.usethisoffersuse_package_doc(), which is a major help, and I think the same could be done by a function, e.g.,use_data_doc(), which takes in a dataset and creates a file with a Roxygen skeleton. It could also be an argument touse_data(), e.g.,use_data(., doc = TRUE)which saves the dataset to/dataand creates a documentation file. Thank you for considering and making such a useful package!
Hi, based on @jennybc suggestions, I created this function to help with the documentation. This is under construction, so any comments will be constructive. I like your idea of use_data(., doc = TRUE), so maybe they will re-open this issue in the future. I am also a big fan of use_this.
OK, we'll reconsider.
We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?
A small suggestion for a new convention: R/data-foo.R
Ooops, that's what I meant to type.
We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single
.R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we havedata/foo.Rd,data-raw/foo.R, andR/data-foo.R?
The FAIR framework can be a helpful resource for organizing dataset documentation files. From my experience, most researchers only describe the variables in the dataset, but including good metadata can make the datasets more valuable and useful. Thank you for considering this issue as a potential feature. Â
@focardozom I'm not familiar with FAIR. Can you please summarise how it might inform a function that automatically creates a documentation template?
@hadley FAIR can be used as a checklist to decide what information should be included in the template created by the function. Following the FAIR, the template should include two categories: (1) metadata, which includes information that helps others find, access, and use the data, such as details about how the data was gathered, licensing, file size, format, etc. Some of this information can be automatically extracted from the data object and included in the template, while other information should be suggested to the user to fill in. (2) The template should also include spaces to describe the variables. Users can use this template to ensure that they at least include basic elements recommended by guides like FAIR.
We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single
.R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we havedata/foo.Rd,data-raw/foo.R, andR/data-foo.R?
I am talking with my mentor @raymondbalise. We looked at how you documented datasets in ggplot2 and we see your point now. Dr. Balise teaches us to do R/foo.R and R/bar.R. He commented that R/data-foo.R is a great idea.
I have been reviewing https://design.tidyverse.org and I would love to apply what I have learned. Can I see how you are coding this or can I help?
Labelling for tidyverse dev day. Overall advice: start small, aim for an MVP (so: probably not everything you see discussed above).