IntroML
IntroML copied to clipboard
Added a check such that grow_tree() can handle duplicate data.
It is quite frequent that the hitters dataset can have the same value on some variables for multiple people.
I don't get this. Do you know if this is a bug? I tried my function without, and it did work (see the teacher repo).
I don't remember exactly but this is what I wrote from last year:
Add: && nrow(unique(X[S_m[[1]],]))>1 here to make sure there are at least 2 unique data points?
even this is not enough..
given l=5: 10 rows with 9 being identical would return a split with (9,1), 1<5 is a problem but will at least not break the predict_with_tree() function.
I think there was some issue with predict_with_tree() getting stuck in an infinite loop or something like that.
Ok. I merge your other PR. Lets see if that is enough.