isolation-forest icon indicating copy to clipboard operation
isolation-forest copied to clipboard

remove private scoping to ease inspection during model development

Open eisber opened this issue 4 years ago • 4 comments

eisber avatar Feb 24 '20 11:02 eisber

Hi @eisber,

Thanks for submitting this pull request!

I'd like to avoid making these public to avoid any future complications if we choose to add new functionality (e.g. extended isolation forests) that changes some of the underlying code.

What specific quantities are you looking to calculate during model development? Perhaps a "model summary" module could print these statistics?

Best, James

jverbus avatar Feb 25 '20 03:02 jverbus

I was trying to understand how big and deep the trees are. Ideally it's flexible so that one can iterate over ideas while looking at the stats?

Maybe we can expose a visit-pattern style API which allows user to pass in a lambda/closure (e.g. (treeId: Int, nodeId: Int, depth: Int, splitFeatureIdx: Int, splitValue: Double) -> Unit)

eisber avatar Feb 25 '20 11:02 eisber

The depth of each tree is already accessible:

isolationForestModel.isolationTrees(0).node.subtreeDepth

We can similarly add another calculation for the number of nodes in a subtree here: https://github.com/linkedin/isolation-forest/blob/master/isolation-forest/src/main/scala/com/linkedin/relevance/isolationforest/Nodes.scala#L12

jverbus avatar Feb 25 '20 18:02 jverbus

ah thanks for pointing out the subtreeDepth property. how would you model any visualization someone might want to create?

overall, I understand the desire to reduce the API surface at the same time it feels like restricting ad-hoc data science a bit much. is there a middle ground (e.g. issuing a warning/marking it)?

eisber avatar Feb 26 '20 15:02 eisber