animint2
animint2 copied to clipboard
Where do animint2's datasets come from, and where are their codebooks?
This is vaguely related to issue #97. I'm trying to generate a very simple example for the basic usage section and decided to use a default dataset. I noticed that animint2
contains a lot of datasets—33 by my count. Some of them come from ggplot2. But I'm not sure where the rest are from.
Where are they from? (For example, where is the WorldBank dataset from?) And where can I find their corresponding codebooks?
As always, no rush in responding. Thanks in advance. 🐈
Where do the data sets come from? It should be documented on the man page, under "sources" otherwise I don't know. Codebooks? I don't know what you mean, but maybe I could help create one if you clarify?
Got it. I've spotted the "Source" subsection in the manual—thanks! :>
You've probably written codebooks before. That word's just jargon for metadata about the datasets. Codebooks usually describe the dataset's variables and how the data were collected. They're great for reproducibility, since variable names themselves are usually insufficient for describing the data.
The diamonds
dataset has one. WorldBank
and montreal.bikes
don't. After some of the other website stuff is set up, I'd be down for writing codebooks together. It'd have to be together for at least some of the datasets, since I don't know the data for e.g. montreal.bikes
and you do.
Or you could just write them yourself. Up to you, obviously. I'm not your boss. 🐈🐈🐈
EDIT: Corrected lots of typos.
I looked it up. "Codebook" is social sciences jargon. Sorry about that! I didn't realize the term wasn't universal in science.
sure, please open a PR with some edits to the man pages, please put TODO where you think I should add some info.
Sure thing. :>
Status update: At least one of the datasets has its source in the comments, which hopefully means that's the case for all of them. The dataset is animint2/data-raw/economics.R
, and the source can be found here.
Note to self: Datasets can be found in animint2/data-raw
.
hi again, if this is still an issue, can you please link a PR with the TODOs? Otherwise, can you please close?
No problem. I've been preoccupied with the reference website, hence the delay. Unless you want me to prioritize this, I'll do it after I throw the website online. To-do for me:
- [x] Look through all the datasets and see if they have a source.
- [x] If they do, continue. If they don't, mark them with a TO-DO.
- [x] Look through all the datasets and see if they have a codebook.
- [x] If they do, continue. If they don't, mark them with a TO-DO.
- [x] Throw up a pull request with the edited datasets.
Okay, website has been thrown online. Do this now, @ampurr. 🐈
Everything checked has a source attached (and therefore I won't need to attach a TODO to it):
- [x] breakpoints
- [x] change
- [x] ChromHMMiterations
- [x] climate
- [x] compare
- [x] diamonds (ggplot2)
- [x] economics (ggplot2)
- [x] economics_long (ggplot2)
- [x] faithfuld (ggplot2)
- [ ] FluView
- [x] FunctionalPruning
- [x] generation.loci
- [ ] intreg
- [x] luv_colours (ggplot2)
- [x] malaria
- [x] midwest (ggplot2)
- [x] mixtureKNN
- [x] montreal.bikes
- [x] mpg (ggplot2)
- [x] msleep (ggplot2)
- [x] PeakConsistency
- [x] pirates
- [x] presidential (ggplot2)
- [x] prior
- [x] prostateLasso
- [x] seals
- [x] TestROC
- [x] txhousing (ggplot2)
- [x] UStornadoes
- [x] VariantModels
- [x] vervet
- [x] WorldBank (is "copied from" sufficient?)
- [x] worldPop
Everything checked has a codebook attached (and therefore I won't need to attach a codebook TODO to it):
- [x] breakpoints
- [x] change
- [ ] ChromHMMiterations
- [ ] climate
- [ ] compare
- [x] diamonds
- [x] economics
- [x] economics_long
- [x] faithfuld
- [ ] FluView
- [ ] FunctionalPruning
- [ ] generation.loci
- [ ] intreg
- [x] luv_colours
- [ ] malaria
- [x] midwest
- [ ] mixtureKNN
- [ ] montreal.bikes
- [x] mpg
- [x] msleep
- [ ] PeakConsistency
- [ ] pirates
- [ ] presidential
- [ ] prior
- [ ] prostateLasso
- [ ] seals
- [ ] TestROC
- [x] txhousing
- [ ] UStornadoes
- [ ] VariantModels
- [ ] vervet
- [ ] WorldBank
- [x] worldPop
Note to self: Not all .Rd
files are generated by roxygen2. Some files were manually created.
Adding TODOs—a progress report:
- [x] ChromHMMiterations (edited .Rd file)
- [x] climate (edited .Rd file)
- [x] compare (edited .Rd file)
- [x] FluView (edited .Rd file)
- [x] FunctionalPruning (edited .Rd file)
- [x] generation.loci (edited .Rd file)
- [x] intreg (edited .Rd file)
- [x] malaria (edited .Rd file)
- [x] mixtureKNN (edited .Rd file)
- [x] montreal.bikes (edited .Rd file)
- [x] PeakConsistency (edited .Rd file)
- [x] pirates (edited .Rd file)
- [x] presidential (edited .Rd file)
- [x] prior (edited .Rd file)
- [x] prostateLasso (edited .Rd file)
- [x] seals (edited .Rd file)
- [x] TestROC (edited .Rd file)
- [x] UStornadoes (edited .Rd file)
- [x] VariantModels (edited .Rd file)
- [x] vervet (edited .Rd file)
- [x] WorldBank (edited .Rd file)
thanks this is useful, I will look at that PR and edit when I get a chance.
Thank you! No rush. :>
hi @ampurr for another project I have sas codebooks defined as below
K2Q01_D in (1,2) then TeethCond_21 = 1;
if K2Q01_D = 3 then TeethCond_21 = 2;
if K2Q01_D in (4,5) then TeethCond_21 = 3;
if K2Q01_D = .M then TeethCond_21 = .M;
if K2Q01_D = 6 then TeethCond_21 = .L;
if SC_AGE_YEARS
do you know if there is any existing package to parse such sas codebook data into R? I did a web search but did not find anything obvious.
Hey, @tdhock. :>
Unfortunately, my department never used SAS, so I don't have any special insight into your problem. Looking it up...
If you just need to parse the output of a SAS program into something R can read, the haven package has a read_sas()
function.
The SASmarkdown package will let you use SAS code with R Markdown.
A possible wacky chain solution:
- The SASPy Python package says that it lets you "exchange values between python variables and SAS macro variables," which seems promising.
- The reticulate R package lets you translate between R and Python objects.
- You might be able to use these two packages in conjunction.
Hope this helps. Good luck with your project. 🐈