Survey.jl
Survey.jl copied to clipboard
Testing using other survey datasets
So far, most of the testing suite is limited to the API dataset. I suggest to improve testing by using other publicly available survey datasets. R Lumley survey textbook examples could be used, (pg 7 section 1.2.1) eg. NHANES, SHS, SIPP.
http://asdfree.com - Analyse Surveys Free has many real-world datasets and examples with respective R survey code.
@iuliadmtru We think (a smaller and older version of) the Scottish Household Survey is a great candidate for testing with the singledesign
branch.
Detailed info and data scripts as well as downloads are available on this really old website.
In the Lumley Survey textbook, you will also find multiple examples of R design and code for the older version of SHS in Chapter 6, figure 6.2 onwards, pg 110-130.
The old PEAS exemplars has 6 surveys full with R code, that are reasonably 'small' for modern computers to be able to be analysed locally without too much hassle. Tests and designs can be translated from the code and explanation given here.
After having a deeper look, I think we should export all of those surveys RData files that are linked in the websites, and add them into Julia assets/
folder. They arent very big, only few KB at most, and about 5-10 thousand obesrvations with weights
, cluster
and strata
.
PR #166 adds more datasets to use for testing. We should remove all the datasets within assets/
that we are not using and will not use for testing.
I added the datasets you mentioned, apart from the last two. Those are not clustered nor stratified. I think we have enough datasets now and we should focus on testing. @ayushpatnaikgit I will start testing right after you push the latest version of bootweights
.
Firstly, should we wget and download these datasets, or ship them part of the package?
Can we check the licenses of those datasets, and whether they are GPLv3 or similar and hence can be distributed with Survey.jl?