The `preprocessed_features` notebook hangs when using a large number of features

Open jbiggsets opened this issue 7 years ago • 2 comments

When the number of features is large, the preprocessed_features notebook hangs, and the report generation times out. The bottleneck seems to be the training set distribution graphs in this notebook.

It would be useful to update this notebook in one of two ways: (1) if the number of features is greater than some threshold X, the histograms are dropped, or preferably, (2) a random subset of the data is used to generate histograms, with a warning to the user that the histograms are not based on the full dataset.

May 09 '18 19:05 jbiggsets

This was originally an issue with kde=True. However, with a large number of features, it still appears that this notebook can time out. Looking into this more.

Oct 25 '18 13:10 jbiggsets

Is this still an issue @jbiggsets ?

Sep 11 '19 14:09 desilinguist