random-forest-importances icon indicating copy to clipboard operation
random-forest-importances copied to clipboard

Added multiprocessing for oob_importances (v2)

Open RohanBhandari opened this issue 5 years ago • 4 comments

Continuation of PR #20 , addressing Issue #19

RohanBhandari avatar Mar 21 '19 18:03 RohanBhandari

excellent. I will take a look this afternoon.

parrt avatar Mar 21 '19 18:03 parrt

hi. okay, I tried it out and it does seem to get the same answers. The only problem is it takes longer for me. kept fundamentally, creating a separate process and having to copy the data over is going to be slower than single threading for anything other than small data sets. Apparently in Python 3.8, we are going to get proper shared memory so I think we should wait until then. I will keep this PR because it might simply work automatically or with a small tweak at the next release of Python. a quick check shows that they are in alpha 2 so it's unclear when it will come out. release candidate seems to be end of September 2019.

parrt avatar Mar 21 '19 21:03 parrt

Okay, that makes sense. I'm just curious how much longer does it take and how large is your dataset? It'll be interesting to see what happens when python 3.8 gets released.

RohanBhandari avatar Mar 22 '19 00:03 RohanBhandari

It's about 50% longer. my data set is about 100,000 records I think

parrt avatar Mar 22 '19 00:03 parrt