tabular-benchmark icon indicating copy to clipboard operation
tabular-benchmark copied to clipboard

how to find big data size - lets say 10 million rows and 80 features

Open Sandy4321 opened this issue 2 years ago • 0 comments

great work Why do tree-based models still outperform deep learning on tabular data?

but can you recommend data set for mixed continues and categorical features for binary classification with big data size - lets say 10 million rows and 80 features ?

when 1 features are not independent - for example some features have dependencies on several other features ?
2 unbalanced data - much more NO labels than YES labels

like https://www.kaggle.com/competitions/amex-default-prediction/data

https://github.com/jxzly/Kaggle-American-Express-Default-Prediction-1st-solution

Sandy4321 avatar Sep 12 '22 18:09 Sandy4321