clean-slate-data icon indicating copy to clipboard operation
clean-slate-data copied to clipboard

Consolidate data pipeline scripts

Open laurafeeney opened this issue 5 years ago • 1 comments

]Create single notebook / script for the data flow from deidentified-but-still-raw data to ‘prosecution_charges_detailed’. Right now, prosecution_charges is both an input and output of two different scripts, without a clear indication of what should be run first. Would be helpful to just condense those steps into a single script.

  • https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data-1_Raw.ipynb
  • https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data-2_MergeCharges.ipynb
  • https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data-2_MergeCharges_alt.ipynb (similar to the above but incorporates some of these two, and some additional edits to the expungeability): -- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/new_expungability_info_join_emily.R -- https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/sex_murder_columns.R
  • https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/MA_Data_revised_joining.R
  • https://github.com/codeforboston/clean-slate/blob/master/analyses/notebooks/Middlesex_Clean.ipynb

The general pipeline is in the readme in the /notebooks page.

Thoughts on how to do this are also drafted here: Procedure for adding new MA prosecution data

laurafeeney avatar Nov 11 '20 00:11 laurafeeney

@agathaalmunir , @mknotts623 , and @linnalihe will review the scripts and write out a summary what the scripts are doing / did. Date to work on this - Thursday 8/26/2021

linnalihe avatar Aug 12 '21 23:08 linnalihe