gbm.auto icon indicating copy to clipboard operation
gbm.auto copied to clipboard

Simplified Model Relative Influences

Open nffarabaugh opened this issue 3 years ago • 10 comments

Is there a way to access the relative influence information for the parameters that are included in the simplified model? It does not appear in the report CSV. Similarly is there a way to autogenerate plots for these?

Cheers!

nffarabaugh avatar Mar 23 '22 18:03 nffarabaugh

mate please could you send me your run script and your data (or a representative chunk so it'll run)? Thanks.

SimonDedman avatar Mar 29 '22 04:03 SimonDedman

gbm.auto: report around L1036 populated by Bin_Bars$var, from L858: summary(get(Bin_Best_Model) from L686 can be bin best simp if worthy. So 858 should populate 1036 with simp thus make simplified bars, and simplified best vars / rel info Report entries.

SimonDedman avatar Mar 29 '22 04:03 SimonDedman

For sure! I have popped my script and the CSV you will need below. Obviously don't share it around.  Thanks for the help!  Cheers,N. Frances Farabaugh Biology PhD Candidate Marine Community & Behavioral Ecology LabFlorida International @.***

On Tuesday, March 29, 2022, 12:22:59 AM EDT, Simon Dedman ***@***.***> wrote:  

mate please could you send me your run script and your data (or a representative chunk so it'll run)? Thanks.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

nffarabaugh avatar Mar 29 '22 19:03 nffarabaugh

I just tried the first run with only tc 1, lr 0.01 and bf 0.5. Best combo was the unsimplified version, so even though Report.csv lists the simp predictors dropped and kept, if the best simp run doesn't lower the deviance, it won't outcompete the existing best unsimplified BRT run, so the relative influence values for the simp model aren't included because they're not relevant.

You can tell which model was chosen as best under "Best Gaussian BRT"; if this doesn't end in "_simp" then there's no issue. Please let me know tc lr bf combos for examples where this isn't the case, and the simp run wins but its best variables & their relative influence scores aren't produced correctly.

One confusing element is that the simp_dops_gaus.jpeg has a negative change in predictuve deviance for the removal of 1, 2, 6, 7, & 8 variables, with 8 being the greatest reduction. Intuitively this would mean the one with 8 dropped variables was better than the 'parent' combo with all variables retained, but actually simp is only selected if it's self.statistics$correlation score is better, aka training data correlation.

LMK how you get on, and please close this if this answers everything. Cheers!

SimonDedman avatar Mar 30 '22 00:03 SimonDedman

Thanks I think this was an error in my understanding. So far none of my models have simp as the "best" model. I was confused because of the negative change in the predictive deviance (simp_dops_gaus.jpeg). Thanks for the help!

nffarabaugh avatar Mar 30 '22 14:03 nffarabaugh

Hello, seems this is an issue even when the best model is a simplified model. I have attached a the generated report and code below. Report_carangidae.csv Self_CV_Statistics.csv gbm.auto( grids = NULL, samples = wide.df1 %>% filter(site != "Nuka Hiva"), expvar = c("temp", "ave_npp", "depth", "visibility", "topo", "pop.dens", "bait", "time.no.bait", "isl_grp", "Season", "lagoon.size"), # fix to final variables resvar = "carangidae_maxN_a", tc = c(5), # add combos you want to see for initial runs and it will try each. doens't run the whole gambit like the loops do lr = c(0.0005), bf = c(0.55), n.trees = 50, ZI = "CHECK", fam1 = c("bernoulli", "binomial", "poisson", "laplace", "gaussian"), fam2 = c("poisson"), # simp = TRUE, # Change to true gridslat = 2, gridslon = 1, multiplot = TRUE, cols = grey.colors(1, 1, 1), linesfiles = TRUE, # change to true for final run smooth = TRUE, savedir = "~/Documents/My Documents/FinPrint French Poly/Analysis/DataExploration_03_2022/Teleosts", savegbm = TRUE, # change to true for final runs loadgbm = NULL, varint = TRUE, map = TRUE, shape = NULL, RSB = TRUE, BnW = TRUE, alerts = TRUE, # this is the noise alerts pngtype = c("quartz"), # quartz for mac this one for windows : "cairo-png" gaus = TRUE, MLEvaluate = TRUE, brv = NULL, grv = NULL, Bin_Preds = NULL, Gaus_Preds = NULL)

nffarabaugh avatar Oct 11 '22 16:10 nffarabaugh

gaus: L1143 & 4:

Report[1:(length(Gaus_Bars[,1])),(reportcolno - 2)] <- as.character(Gaus_Bars$var)
Report[1:(length(Gaus_Bars[,2])),(reportcolno - 1)] <- as.character(round(Gaus_Bars$rel.inf), 2)

Bin is L1067:75

L887: if (gaus) {Gaus_Bars <- summary(get(Gaus_Best_Model), so bin/gaus_bars are already simp if simp was better... So why are they printing all of the rel.inf's if most of the vars got dropped?

Gaus_Bars <- summary(get(Gaus_Best_Model),
                                      cBars = length(get(Gaus_Best_Model)$var.names),
                                      n.trees = get(Gaus_Best_Model)$n.trees,
                                      plotit = FALSE, order = TRUE, normalize = TRUE, las = 1, main = NULL)
      write.csv(Gaus_Bars, file = paste0("./", names(samples[i]), "/Gaussian BRT Variable contributions.csv"), row.names = FALSE)

Output csv colnames: var, rel.inf. I.e. not cBars nor n.trees. Odd.

L668: simplification. L671: Gaus_Best_Simp assigned gbm object AFTER simplification, so should have extra variables dropped?

See notes from Bonnie having the same issue, L674:681 L680 & 645 replacements testing now.

SimonDedman avatar Oct 11 '22 17:10 SimonDedman

Pushed change, model re-run by Frances didn't need simplifying so change not tested, dangerzone.

SimonDedman avatar Oct 12 '22 15:10 SimonDedman

NFF any update on this, did the change solve the issue? If so please mark as closed. Cheers!

SimonDedman avatar Nov 21 '22 22:11 SimonDedman