cobra icon indicating copy to clipboard operation
cobra copied to clipboard

Analyze and improve speed and memory consumption

Open JanBenisek opened this issue 3 years ago • 8 comments

We had a use case at Argenta, where we worked with table of about 300 cols and ~2 mil. of rows. There, the preprocessing took a lot of time and memory especially.

What we’d need is to find any dataset which is of similar size and is close to reality (mixture of categorical, flags and continuous variables, has missing) and see how much memory Cobra uses and how slow it is.

The issue occurs in preprocessor.fit() and preprocessor.transform() – but these guys do a lot behind, so I am trying to pinpoint the cause (is it the binning? Incidence replacement? Maybe the data types of intermediate tables are not efficient and it takes too much memory … ).

Once we find the cause, we can figure out how to fix it.

JanBenisek avatar Mar 12 '21 08:03 JanBenisek