usethis Function to create initial data docs

Hi! document data is always tricky, so having a function to help people document the dataset would be fantastic. This function could create an .R file in the R folder containing information gathered from the data set. For example, the information can be inserted in a roxygen template, and the @items values can be filled using glue(). The template could also include some descriptive information to help users understand the dataset better.

Sep 20 '22 02:09 focardozom

Would you like to make a PR for consideration? Full disclosure: I'm not 100% convinced that usethis should do this. But this is the topic of an issue I recently closed in R Packages, which contains some concrete ideas to start with.

https://github.com/hadley/r-pkgs/issues/707

Sep 20 '22 02:09 jennybc

As I continue to work on R Packages, I've learned there are packages that already offer functions to do this. One example is sinew (https://cran.r-project.org/web/packages/sinew/index.html). So given that there are solutions out there already, I don't think it's a priority for us to add this to usethis.

Sep 21 '22 15:09 jennybc

Just wanted to add to this since I had the same thought. sinew does not fit well into the package development ecosystem for usethis users; it prints a string rather than creating a file, doesn't use Markdown syntax, doesn't check for existing Roxygen etc. usethis offers use_package_doc(), which is a major help, and I think the same could be done by a function, e.g., use_data_doc(), which takes in a dataset and creates a file with a Roxygen skeleton. It could also be an argument to use_data(), e.g., use_data(., doc = TRUE) which saves the dataset to /data and creates a documentation file. Thank you for considering and making such a useful package!

Nov 09 '22 21:11 ngreifer

Just wanted to add to this since I had the same thought. sinew does not fit well into the package development ecosystem for usethis users; it prints a string rather than creating a file, doesn't use Markdown syntax, doesn't check for existing Roxygen etc. usethis offers use_package_doc(), which is a major help, and I think the same could be done by a function, e.g., use_data_doc(), which takes in a dataset and creates a file with a Roxygen skeleton. It could also be an argument to use_data(), e.g., use_data(., doc = TRUE) which saves the dataset to /data and creates a documentation file. Thank you for considering and making such a useful package!

Hi, based on @jennybc suggestions, I created this function to help with the documentation. This is under construction, so any comments will be constructive. I like your idea of use_data(., doc = TRUE), so maybe they will re-open this issue in the future. I am also a big fan of use_this.

Nov 10 '22 01:11 focardozom

OK, we'll reconsider.

Nov 10 '22 02:11 jennybc

We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?

Jan 18 '23 15:01 hadley

A small suggestion for a new convention: R/data-foo.R

Jan 18 '23 15:01 ijlyttle

Ooops, that's what I meant to type.

Jan 18 '23 15:01 hadley

We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?

The FAIR framework can be a helpful resource for organizing dataset documentation files. From my experience, most researchers only describe the variables in the dataset, but including good metadata can make the datasets more valuable and useful. Thank you for considering this issue as a potential feature.

Jan 19 '23 16:01 focardozom

@focardozom I'm not familiar with FAIR. Can you please summarise how it might inform a function that automatically creates a documentation template?

Jan 19 '23 16:01 hadley

@hadley FAIR can be used as a checklist to decide what information should be included in the template created by the function. Following the FAIR, the template should include two categories: (1) metadata, which includes information that helps others find, access, and use the data, such as details about how the data was gathered, licensing, file size, format, etc. Some of this information can be automatically extracted from the data object and included in the template, while other information should be suggested to the user to fill in. (2) The template should also include spaces to describe the variables. Users can use this template to ensure that they at least include basic elements recommended by guides like FAIR.

Jan 20 '23 05:01 focardozom

We probably also need to think a little about how we organise data documentation files. We currently tend to dump all data docs into a single .R, but that's obviously going to be harder to edit with a script. Maybe we should move to a convention we have data/foo.Rd, data-raw/foo.R, and R/data-foo.R?

I am talking with my mentor @raymondbalise. We looked at how you documented datasets in ggplot2 and we see your point now. Dr. Balise teaches us to do R/foo.R and R/bar.R. He commented that R/data-foo.R is a great idea.

I have been reviewing https://design.tidyverse.org and I would love to apply what I have learned. Can I see how you are coding this or can I help?

Feb 06 '23 20:02 focardozom

Labelling for tidyverse dev day. Overall advice: start small, aim for an MVP (so: probably not everything you see discussed above).

Jul 22 '24 23:07 jennybc

usethis usethis copied to clipboard

Function to create initial data docs

usethis
usethis copied to clipboard