Question about wealth vars
Hi Florian,
Thanks so much for this amazing package, it really saves time!
I am currently using this package for some data analysis and I get a little bit confused about how the wealth variables are handled in this package.
If I understand it correctly, the wealth.vars argument was removed from the build.panel() method (in one of your latest commits), since "wealth variables are added to the main family files from 1999 and onwards". This gives me the impression that if I only want to build panels with the 1999-2019 waves, I can simply add those wealth variables to the list of family variables and pass them together to the fam.vars arguement.
However, it seems like, if I let this package download the data files itself, for years 1999, 2001, 2003, 2005, 2007 (i.e. all those years when wealth vars are put in a separated wealth supplemental file), the package would only download the main family data file (but not the wealth data file). And, of course, it could not find the wealth variables I would like to include in my panel. I therefore start to wonder if we should have the wealth.var argument back, or maybe I am just misunderstanding something here?
I guess the tricky thing is, with the data structure changed in the middle, it is not easy to maintain a clear distinction between wealth.vars and fam.vars, or wealth data files and family data files, and subset the two separately. One possible solution to get around this might be: (1) automatically download the wealth data files when the problematic waves are involved (1999-2007); (2) merge the wealth data file together with the main family data file of the same year, thus, we would have an 'amplified' main family file which should be more comparable with what we have for the later waves (when the wealth vars are not separated from the main family file) ; (3) subset the 'amplified' family file, and then merge the selected columns to the cross-year individual file accordingly
Does it make any sense?
Many thanks in advance!
Youran
Besides, I noticed another thing with the all-year-individual-level variables (e.g. SEX OF INDIVIDUAL ER32000).
When we call the getNamesPSID() function with them, their var name would only show up in the row corresponding with the most recent wave (Year 2019 at this moment). I think that is actually due to an issue with the PSID cross-year-index spreadsheet (i.e. the crosswalk that is used as the reference in this function) itself. There is no 'All Year' column in this crosswalk, instead, the all-year vars are only put in the column of the last year.
I guess we could solve this by adding a check in the getNamesPSID() function. We could hard code a list of all-year individual level vars, and if the given var is in this list, the blanks of previous years would be filled in automatically.
Interesting ! I did not know if this all years thing. Not sure the function that gets the var names is still the best way to do this. Any suggestions (and pull requests!) welcome
Thanks for the quick reply! A pull request was just created for this all-year-ind-var issue :))
Besides, I wonder if you saw my first comment (about the wealth data files and the wealth vars) on the top of this thread. I once closed it by mistake and then had it re-opened. That is what I am still uncertain about and need your confirmation with. Any idea on that?
Hello, any updates on the wealth vars part?
Sorry guys I'm trying to get tenure. Happy to give some guidance to whoever wants to propose a PR though!