fingertipsR icon indicating copy to clipboard operation
fingertipsR copied to clipboard

predict_indicator()

Open sebastian-fox opened this issue 7 years ago • 4 comments

A function that predicts the next year value of an indicator. predict_indicator():

  • IndicatorID
  • R asks for area type for prediction (eg, UTLA), then:
    • Extracts all indicator data for indicators in same profile(s) at the same geography
    • Identifies latest year for target indicator
    • Subsets dataframe of latest year information for all indicators
    • Creates flat, wide table of remaining indicators with variables for each previous year of data available for each indicator (eg, indicator_x_1yr_previous, indicator_x_2yr_previous, … , indicator_x_nyr_previous)
    • Trains and tests model on second latest year for target indicator (maybe multiple machine learning methods)
    • Uses best model to predict next year of data for indicator
      • Lasso
      • Glm
      • Svm
      • Randomforest

sebastian-fox avatar Jun 19 '17 09:06 sebastian-fox

https://github.com/julianflowers/Data-science/blob/master/scripts/get_sui_data.R https://github.com/julianflowers/Data-science/blob/master/suicide_prediction2.Rmd

julianflowers avatar Aug 24 '17 14:08 julianflowers

This is not so much forecasting but prediction (subtle I know) but prediction seems to be about fitting values to unseen data, forecasting about the future. To forecast next year we would need to be able to estimate all the model inputs as well...

julianflowers avatar Aug 24 '17 14:08 julianflowers

Have been trying a few other models - xgboost, gbm, brnn...

xgboost seems to be very popular - a bit fiddly brnn is a bayesian neural network which seems quite accurate

julianflowers avatar Jan 08 '18 09:01 julianflowers

This looks really good. I'm starting to think this belongs to a different package. This package has been reviewed by some rOpenSci reviewers and one of the comments is to reduce dependencies on other packages. That is a good suggestion and helps draw the boundaries around the limits of this package. I think we need to start developing the insights package internally...

sebastian-fox avatar Jan 08 '18 10:01 sebastian-fox