unconf16 icon indicating copy to clipboard operation
unconf16 copied to clipboard

automatic creation of codebook and metadata of a dataset

Open RMHogervorst opened this issue 8 years ago • 4 comments

I thought about a semi-automatic creation of dataset description.

Metadata about a dataset is often created afterwards which takes a lot of work. (and is therefore not done) Perhaps some sort of documentation maker, summarizing values, with space to describe columns would help people out.

  • Display a summary of a data.frame or tbl_df with a few commands
  • It should display information like str() and summary() but more
  • Endresult in a html file for example
  • or it could prepopulate a rmarkdownfile, so you only need to tell r what type of variable it is and a short description.

my repo contains only some ideas about what it should do. https://github.com/RMHogervorst/summarize_dat/

RMHogervorst avatar Mar 10 '16 17:03 RMHogervorst

Good idea! Relates to the larger issue of how to handle metadata in R.

Some relevant links:

  • rOpenSci's EML package
  • The memisc package has a codebook() function for its data.frame-like "dataset" class.
  • The DDIwR package allegedly writes Data Documentation Initiative metadata, which would probably be a reasonable standard to write a complete R client for. There's also an archived package, spssDDI, which attempted to make DDI out of an SPSS data file.
  • Stata has a codebook function that might be a useful model for output.
  • Also for Stata, qobgook is a module that writes this fine looking, but use-case specific codebook.
  • For the reverse process (codebook to data), I've always found this codebook parsing post on R-bloggers interesting.

leeper avatar Mar 10 '16 18:03 leeper

Thankss for the ideas leeper , I have looked before at memisc and ddiwr packages. It would be nice if the metadata of a dataset is automaticaly created in a standard format.

RMHogervorst avatar Mar 10 '16 21:03 RMHogervorst

Also check the latest OKFN Tabular Data Package specifications http://data.okfn.org/doc/tabular-data-package (uses json codebooks). This standard seems to be getting some traction as well.

mbacou avatar Apr 04 '16 03:04 mbacou

Ideally some form of documentation is generated, users can add to it and then this information is translated into DDI style, json format and others. Light weight addons that can be user specific

RMHogervorst avatar Apr 04 '16 09:04 RMHogervorst