model-implementation-principles Check for integer values for arguments

trafficstars

I don't know if this fits within this repo so please bear with me.

I have been thinking about what happens when we check the type of the arguments we pass into our models, and for the most part that task is fairly easy. But I feel we have overlooked the case when we are concerned with argument such as times , epochs and mtry that require integers. Often the approach is to round down the double silently which I think is a little dangerous.

Examples

library(ranger)

ranger(
  mpg ~ ., 
  data = mtcars, 
  mtry = 3.8, 
  num.trees = 20.2, 
  importance = 'impurity'
)
#> Ranger result
#> 
#> Call:
#>  ranger(mpg ~ ., data = mtcars, mtry = 3.8, num.trees = 20.2,      importance = "impurity") 
#> 
#> Type:                             Regression 
#> Number of trees:                  20 
#> Sample size:                      32 
#> Number of independent variables:  10 
#> Mtry:                             3 
#> Target node size:                 5 
#> Variable importance mode:         impurity 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       5.142068 
#> R squared (OOB):                  0.8584392

library(parsnip)

rand_forest(mode = "regression", mtry = 2.3) %>%
  fit(mpg ~ ., 
      data = mtcars,
      engine = "ranger")
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(formula = formula, data = data, mtry = 2.3, num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  500 
#> Sample size:                      32 
#> Number of independent variables:  10 
#> Mtry:                             2 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       6.186759 
#> R squared (OOB):                  0.829679

I feel like this problem is already being looked at in the vctrs package with vec_cast but might be something you should consider when implementing models in R.

Oct 20 '18 20:10 EmilHvitfeldt

I like this as a general principle for writing functions, and vctrs's lossy cast warning functions are pretty nice for this. What's nice about it is that integerish things like the numeric 2 don't trigger the warning but 2.2 would. I personally don't think this is specific enough to be a model principle, but could see it living as a general principle in the functions chapter of the next version of advanced R (but I have no control over that). Maybe Max has a different opinion though because it is a useful point overall.

Oct 20 '18 21:10 DavisVaughan

Certainly I think we should have a section about input validation, potentially listing some easy tests to do to avoid mistakes like this.

Oct 22 '18 19:10 alexpghayes

model-implementation-principles model-implementation-principles copied to clipboard

Check for integer values for arguments

model-implementation-principles
model-implementation-principles copied to clipboard