asdfree
asdfree copied to clipboard
add the national longitudinal mortality study
correspondence on standard error replication
Hi, thanks, it makes sense that the calculation might not be straightforward. I am just looking for any programming syntax that performs this task [see my highlight in your e-mail below] for any set of NLMS years. From the NLMS bibliography [1], it looks like you've published at least six things that include some measure of uncertainty over the past decade. I don't care which language it's written in, or if it's messy or uncommented, or if it's simply a standard deviation around a descriptive statistic [2] or something more complicated like a CI around a hazard ratio [3]. I would just appreciate some code that provides a more concrete understanding of the Census Bureau's technique with NLMS. Could you please provide me with this, or refer me to someone else on the CARRA team who might be able to?
In case it's of any interest, my goal is to write up a detailed how-to for other users to work with NLMS, as I have already done with the datasets listed here [4].
Thank you!
[1] https://www.census.gov/did/www/nlms/publications/bibliography.html
[2] table #1 from http://bmcpublichealth.biomedcentral.com/articles/10.1186/1471-2458-14-705
[3] table #3 from http://www.hindawi.com/journals/jce/2013/490472/tab3/
[4] https://github.com/ajdamico/asdfree
Subject: Re: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
Anthony, because we typically combine multiple files over several sampling designs for the analyses that we do, we do not have the code that you seek. It is impossible to determine how to make the adjustments that you seem to need since the PSUs and the Segments change in terms of their self representation or not over the different designs. The same would be true of our public use file. Each of the files in the public-use file consists of sample data from several decades (designs). We use weighted data to account for the major sampling issues by combining data and adjusting the weights to account for the combined samples. We do not have software that performs the procedures to which you refer. Norm
Subject: RE: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
Hi, this jackknifing or half-sample replication technique is exactly what I am after! Could you please share any snippets of code showing the analytic decisions that go into a variance calculation? I don't care if it's messy or hard to follow, I would just like to follow along more concretely with your variance calculation process. Thanks!
Subject: Re: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
The data in any of the files in the puf are collections of multiple samples from around the identified date. These samples have been reweighted to account for the various sampling strategies used by CPS and ASEC starting with the final weights from the original files. The weights of these original samples are then ratio adjusted to accommodate the different samples sizes available to the NLMS for the particular puf. Approximate standard error estimates would be determined from the usual formula for simple random samples using weighted data. The only other option would be to develop some replicate variance strategy such as using half samples or some jackknife variation procedures to generate variance estimates. This would depend on particular applications. To obtain variance estimates in software, we frequently rescale the weights by dividing by the overall average weight of the weights involved. This procedure rescales the weights so that the weighted data represents values that sum to the sample size(the weights average out to 1, essentially looking like unweighted data, which is weighted data with weight 1). This maintains the relative weight relationship of the data and software can used to determine the standard error estimate without, "exploding." Hope this helps. We do not have examples for you to work from. Norm
Subject: RE: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
Thanks, could anyone on your team provide any example of how to calculate any variance or standard error or margin of error for any data point with NLMS?
This is something that the US Census Bureau generally includes with any PUF, for example:
ACS: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/estimates/pums_estimates_14.lst
www2.census.gov
www2.census.gov
Tallies of 2014 PUMS file - a Subsample of the 2014 American Community Survey 15:43 Monday, September 28, 2015 1 State of current residence=00 State=United States ...
CPS: https://www.census.gov/prod/2002pubs/tp63rv.pdf#page=107
AHS: http://www2.census.gov/programs-surveys/ahs/2009/tables/AHS_2009_National_Tables.xls
I am looking for any measure of uncertainty from NLMS that I can replicate precisely..
Thank you for the awesome data!
Subject: Re: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
Anthony, I am assuming that you are working with our public-use file. Even though our full database is as you have read, the public use file is made to look more like a sample from a particular time point. We have attempted to account for the sampling issues involved in the sample weight that has been provided. The data in one of the files of the public-use file is a collection of data from several CPS or ASEC samples. These data have been reweighted to account for the various aspects of sampling that exist for these data. Estimates of variances using weighted data would be appropriate. Hope this helps, Norm
Subject: RE: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
Hi NLMS team, just wanted to follow up about this.. thanks!
Subject: how to calculate a variance, standard error, or confidence interval around NLMS estimates?
Dr. Johnson, thanks to your team for administering the NLMS. Your documentation mentions that, "The NLMS is a unique research database in that it is based on a multistage stratified sample of the non-institutionalized population of the United States." I think that means I should be using a clustering and possibly a stratifying variable to calculate the standard errors, but I'm not sure which to use? Do you have any example SAS, SUDAAN, Stata, SPSS, or R code that correctly computes any statistic and also some measure of variance like the SE or CI? I think I'm just looking for a couple of lines of code. Here's the R code [1] that I use to exactly match the published variances in the Current Population Survey documentation. Thanks!
[1] https://github.com/ajdamico/asdfree/blob/master/Current%20Population%20Survey/replicate%20census%20estimates%20-%202011.R#L79-L89