lodown
lodown copied to clipboard
NSFG Data Issue for 1995 Female Response
Hello,
The RDS file produced by lodown for the National Survey of Family Growth (NSFG) for the 1995 Female Response dataset has empty values for both the stratum variable ("col_str") and the weight variable ("post_wt"). Below I show the 1995 codebook documentation for col_str:
Said column is not empty in the fixed-width 1995 .DAT file -- I've highlighted cols 12347-48, which aren't empty for any line:
Replication code to download 1995 RDS file:
require("data.table")
require("stringr")
require("lodown")
nsfg_cat = get_catalog("nsfg", output_dir = file.path( path.expand( "~" ) , "NSFG" ) )
nsfg_dt = data.table(nsfg_cat)
nsfg_dt = nsfg_dt[str_detect(output_filename, "1995FemRespData.rds"), ]
nsfg_cat = lodown("nsfg", nsfg_dt )
vroom::problems()
nsfg1995 = readRDS(file.path( path.expand( "~" ) , "NSFG", "1995FemRespData.rds" ))
unique(nsfg1995[["col_str"]])
Output:
nsfg catalog entry 1 of 1 stored at 'C:/Users/desk/Documents/NSFG/1995FemRespData.rds'
nsfg local download completed
Warning message:
One or more parsing issues, see `problems()` for details
> vroom::problems()
Error in vroom_materialize(x, replace = FALSE) :
argument "x" is missing, with no default
> nsfg1995 = readRDS(file.path( path.expand( "~" ) , "NSFG", "1995FemRespData.rds" ))
> unique(nsfg1995[["col_str"]])
[1] NA
Additional details:
(nsfg_dt)
full_url
1: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NSFG/1995FemRespData.dat
sas_ri
1: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NSFG/sas/1995FemRespSetup.sas
beginline output_filename
1: 1 C:/Users/desk/Documents/NSFG/1995FemRespData.rds
any chance you could investigate inside of lodown:::lodown_nsfg
to try to diagnose the issue? my bet is 1995FemRespSetup.sas has SAS syntax that SAScii::parse.SAScii
isn't dealing with properly.. thanks!