IntroML icon indicating copy to clipboard operation
IntroML copied to clipboard

Added a check such that grow_tree() can handle duplicate data.

Open andreasostling opened this issue 2 years ago • 3 comments

It is quite frequent that the hitters dataset can have the same value on some variables for multiple people.

andreasostling avatar Nov 14 '23 20:11 andreasostling

I don't get this. Do you know if this is a bug? I tried my function without, and it did work (see the teacher repo).

MansMeg avatar Nov 15 '23 19:11 MansMeg

I don't remember exactly but this is what I wrote from last year:

Add: && nrow(unique(X[S_m[[1]],]))>1 here to make sure there are at least 2 unique data points?

even this is not enough..

given l=5: 10 rows with 9 being identical would return a split with (9,1), 1<5 is a problem but will at least not break the predict_with_tree() function.

I think there was some issue with predict_with_tree() getting stuck in an infinite loop or something like that.

andreasostling avatar Nov 15 '23 22:11 andreasostling

Ok. I merge your other PR. Lets see if that is enough.

MansMeg avatar Nov 16 '23 05:11 MansMeg