auto_ml
auto_ml copied to clipboard
How can I pass train/test splits to auto ml for classification?
What to do when we dont have a "column description" and only a vector of labels?
X_train, X_test, y_train, y_test = train_test_split(X, Y,
train_size=0.75, test_size=0.25)
ah! i was debating whether it would be useful to let people pass in data like that or not. you're actually the first person to request it.
for now, just make a DataFrame from it (both X_train and X_test), and then make y_train a column in there. then you should be able to make a column_descriptions object pretty easily.
i'll see if we can add direct support for numpy matrices as input in the future!
I take no credit for this but a related question on stackoverflow generated this respone (the questions related to a train, validate, test split:
This will split your dataframe into a 60/20/20 set of dataframes:
train, validate, test = np.split(df.sample(frac=1), [int(.6*len(df)), int(.8*len(df))])
If you only want one split (e.g., 80/20 as per below, you can use this:
train, test = np.split(df.sample(frac=1), [int(.8*len(df))])
This will be a very relevant for the compatibility with the MultiOutputRegressor from sklearn.multioutput