iml icon indicating copy to clipboard operation
iml copied to clipboard

Object storage sizes

Open ugroempi opened this issue 3 years ago • 4 comments

I have been working with an interaction forest (intaus) on 2000 observations that consists of the default 20000 trees (from package diversityForest). This forest uses 306 682 KB disk space. I have applied Interaction$new and FeatureEffects$new to that forest and stored the resulting objects on disk (R work spaces with a single object each). I end up with the following stored object sizes:

hilf <- Predictor$new(intaus, data=as.data.frame(yx2[,-1]), y=yx2[,1], predict.function=predfun)
hilf2 <- Interaction$new(hilf)
## storage size is 1 227 232 KB
fes <- FeatureEffects$new(hilf)
## storage size is 1 248 226 KB

To me, these sizes appear excessive. I wonder what functionalities of these objects I might miss that justify these huge object sizes. Or would it perhaps be possible for Interaction$new and FeatureEffects$new to return smaller objects without sacrificing functionality?

Best, Ulrike

ugroempi avatar May 01 '21 11:05 ugroempi

What is your use case for storing these objects?

One reason for the size is that the Predictor is part of Interaction / FeatureEffects. But it seems not completely explanatory for the size, maybe it is stored more than once.

christophM avatar May 06 '21 08:05 christophM

The use case is that I don't want to invest the run time again, and want to have them available later e.g. for plotting or printing in comparison to other numbers calculated elsewhere.

ugroempi avatar May 06 '21 08:05 ugroempi

I have not tried it yet, but you could try setting the predictor to NULL: interaction_object$predictor = NULL This should make the object a lot smaller. The results are stored in a data.frame in $results and the plotting should not be affected by it either. It's a hacky solution, so I can't guarantee it works right away

christophM avatar May 06 '21 08:05 christophM

Thank you for the proposal. After setting the $predictor to NULL, the file size was only 252 kB. The plot method still works, the print method doesn't (but I can of course access the $results nevertheless).

I think that it would be highly desirable that output objects for interactions and feature effects are more parsimonious per default (green ML!).

By the way, from within R I found it quite difficult to assess object sizes. object.size(hilf2) returned size 448 Bytes(!) for the huge object. That size remains unchanged after removing $predictor.

ugroempi avatar May 06 '21 09:05 ugroempi