Adam Li comments

Results 473 comments of


                                            Adam Li

trafficstars

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

If it helps the discussion, I found this [old issue](https://github.com/microsoft/LightGBM/issues/2921) in LightGBM, which seems to reflect their docs (I'm unsure cuz I can't find a specific line mentioning how they...

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

I imagine there is some overhead due to the potential to query a RNG many times. Tho this would potentially be also an edge case since it implies there are...

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

I think a warning would be nice but an error message might be overkill because when a NaN pops up in the testing dataset (but not training), ideally our use...

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

Responding to @glemaitre > In terms of an application use case, I'm also wondering if we should not error/warn if a user starts to provide missing values at test time...

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

Relevant paper showing empirical evidence that that sending samples to the majority child node is not as good as "random" when the sample contains an unseen category during training. In...

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

I spoke with @betatim today about this issue, and to summarize, I think a good strategy is the following (hopefully he agrees :p): If no missing-values are encountered during training,...

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

> > If no missing-values are encountered during training, then flip a coin and set the missing value traversal to be random. > > Is there literature that backs this...

Adam Li

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

BUG(?) Missing-values in RandomForest only during inference time shouldn't send missing-values to the child with most samples

Undocumented change in tree_.value example for DecisionTreeClassifier between versions 1.3.2 and 1.4.2

Undocumented change in tree_.value example for DecisionTreeClassifier between versions 1.3.2 and 1.4.2

dropped bottleneck warning message