tabular-benchmark
tabular-benchmark copied to clipboard
how to find big data size - lets say 10 million rows and 80 features
great work Why do tree-based models still outperform deep learning on tabular data?
but can you recommend data set for mixed continues and categorical features for binary classification with big data size - lets say 10 million rows and 80 features ?
when
1
features are not independent - for example some features have dependencies on several other features ?
2
unbalanced data - much more NO labels than YES labels
like https://www.kaggle.com/competitions/amex-default-prediction/data
https://github.com/jxzly/Kaggle-American-Express-Default-Prediction-1st-solution