auto_ml icon indicating copy to clipboard operation
auto_ml copied to clipboard

How can I pass train/test splits to auto ml for classification?

Open omarcr opened this issue 8 years ago • 3 comments

What to do when we dont have a "column description" and only a vector of labels?

X_train, X_test, y_train, y_test = train_test_split(X, Y,
                                                    train_size=0.75, test_size=0.25)

omarcr avatar Aug 22 '17 16:08 omarcr

ah! i was debating whether it would be useful to let people pass in data like that or not. you're actually the first person to request it.

for now, just make a DataFrame from it (both X_train and X_test), and then make y_train a column in there. then you should be able to make a column_descriptions object pretty easily.

i'll see if we can add direct support for numpy matrices as input in the future!

ClimbsRocks avatar Aug 22 '17 21:08 ClimbsRocks

I take no credit for this but a related question on stackoverflow generated this respone (the questions related to a train, validate, test split:

This will split your dataframe into a 60/20/20 set of dataframes:

train, validate, test = np.split(df.sample(frac=1), [int(.6*len(df)), int(.8*len(df))])

If you only want one split (e.g., 80/20 as per below, you can use this:

train, test = np.split(df.sample(frac=1), [int(.8*len(df))])

onacrame avatar Oct 15 '17 05:10 onacrame

This will be a very relevant for the compatibility with the MultiOutputRegressor from sklearn.multioutput

GlennCeusters avatar Apr 09 '18 13:04 GlennCeusters