Max Ghenis

Results 181 comments of Max Ghenis

CV is helpful for variable selection and other tuning parameters; for example, random forests do something like CV as part of the algorithm, and there are prebuilt CV methods for...

You're comparing two models: 1. Random forests (just a single model) 2. Trinomial logit + two linear models for positive and negative logit predictions In each case you're evaluating on...

Logit + 2 RFs could be a third model, but RF alone is worth testing and I'd personally start with a single RF vs. logit+LM. The single RF will perform...

> This would probably lower the MSE but would avoid the "zero-fuzzing" that occurs when using the average of the entire row for all observations. You should select a random...

> I am still a bit unclear on the motivation behind inserting randomness, which underlies some of my incorrect assumptions going in to this process (directly imputing predicted categories in...

Here's an [example](https://github.com/shahejokarian/regression-prediction-interval/blob/master/linear%20regression%20with%20prediction%20interval.ipynb) using `sklearn` for prediction intervals. It's not clear whether it works for regularized models like Lasso.

> Second, the EIC variable is categorical (like the MARS variable), so you should convert EIC into a set of dummy variables (omitting the first category) just like you did...

Thanks Avi, does this chart represent the results correctly? Method | RMSE on full test data | RMSE on positive test data | RMSE on negative test data -- |...

> long-term I understand the goal would be to use a random tree (if RF) or random point from the CDF (if OLS) so that the imputation is stochastic, correct?...

In addition to parsimony, is another rationale for nixing small weights a concern of overfitting? If so, LASSO regression comes to mind, which reduces risk of overfitting by including an...