data_science_delivered
data_science_delivered copied to clipboard
Observations from Ian on successfully delivering data science products
What sort of plan can be proposed to layout a successful project from idea through to deployment? How to derisk? Evaluate value, costs and risks. How to stage it. How...
* Add Gradient Boosted Tree Partial Dependence plot - does it agree with my 2 variable exploration with RF? * Add more cumulative plots * Reorder so the experimental stuff...
- handling missing data (imputation), dropping these rows, 'fixing them', changing the distribution without realising it (can I visualise this as a demo?) - class imbalance - massive imbalances (e.g....
- if you lack constraints on datastores then duplicates will occur - how to create setup.py - hypothesis can fuzz mysql to make sure the data going in and back...