clean-slate-data
clean-slate-data copied to clipboard
Consolidate data pipeline scripts
]Create single notebook / script for the data flow from deidentified-but-still-raw data to ‘prosecution_charges_detailed’. Right now, prosecution_charges is both an input and output of two different scripts, without a clear indication of what should be run first. Would be helpful to just condense those steps into a single script.
- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data-1_Raw.ipynb
- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data-2_MergeCharges.ipynb
- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data-2_MergeCharges_alt.ipynb (similar to the above but incorporates some of these two, and some additional edits to the expungeability): -- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/new_expungability_info_join_emily.R -- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/sex_murder_columns.R
- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data_revised_joining.R
- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/Middlesex_Clean.ipynb
The general pipeline is in the readme in the /notebooks page.
Thoughts on how to do this are also drafted here: Procedure for adding new MA prosecution data
@agathaalmunir , @mknotts623 , and @linnalihe will review the scripts and write out a summary what the scripts are doing / did. Date to work on this - Thursday 8/26/2021