Seeking verification of which data files to download and where to put them
One reason (perhaps not the only one) I ran into the problem in issue #373 is that I don't have the needed puf and cps files in the right location on my system.
The PUF seems pretty clear from reading the code: I think I need to put puf2011.csv in the taxdata/data/ folder. (I haven't run the make file to see if it works because I don't have the right CPS files yet).
The CPS is more challenging. I could not find cps_raw.csv.gz on my computer or on the internet, so I looked for the raw source CPS files. This seems to say that I need to get the following CPS files and put them in a folder. Unless I missed it, it does not say where to get the best versions of these files or where to put them:

I went to the CPS March Supplement datasets site and here is what I found (after unzipping):

As you can see, the 2014 and 2016 files have different names than taxdata documentation says taxdata expects.
With this as background, here are my questions:
- Is the
taxdata/data/folder the right place to put puf2011.csv? - Is there someplace that I can get cps_raw.csv.gz?
- Where can I get the proper raw CPS .dat files expected by taxdata?
- Is the
taxdata/data/folder the right place to put the CPS files?
Many thanks in advance.
Is the
taxdata/data/folder the right place to put puf2011.csv?
Yep.
Is there someplace that I can get cps_raw.csv.gz?
We don't save that file anywhere due to its size, but I can send it to you.
Where can I get the proper raw CPS .dat files expected by taxdata?
From NBER
Is the taxdata/data/ folder the right place to put the CPS files?
Yep!
Thanks! FWIW, the 2016 file on the NBER site is named "asec2016_pubuse_v3.dat" (same as on the Census site), rather than the "asec2016_pubuse.dat" that taxdata expects.
Because the default version of taxdata uses 2013, 2014, and 2015, but not 2016 or later, this is not an issue. However, it might be worth editing the documentation to note, for users, that later versions of the ASEC may have different names. Also, if I am reading the code properly, people who want to use different years of the ASEC only specify the year rather than the full file name, which means (I suspect) that they will have to rename files from the Census names to those used by taxdata, which might not be ideal from a reproducibility standpoint.
FWIW, the 2016 file on the NBER site is named "asec2016_pubuse_v3.dat" (same as on the Census site), rather than the "asec2016_pubuse.dat" that taxdata expects.
Thanks for letting me know! The code itself is good, but you're right that the documentation is incorrect. I'll update accordingly.
Also, if I am reading the code properly, people who want to use different years of the ASEC only specify the year rather than the full file name, which means (I suspect) that they will have to rename files from the Census names to those used by taxdata, which might not be ideal from a reproducibility standpoint.
Yes, they only need to specify years. For reproducibility, taxdata does have an option for specifying a path to the directory where all of the files are held so as long they don't rename any files after downloading and keep everything in one place, only needing to specify years should save some key strokes.