mljar-supervised
mljar-supervised copied to clipboard
Custom validation/test set - turn off cross-validation (CV)
As title suggests. How do I do it please?
There is an additional cv
argument in the fit()
. If validation is set to custom validation_strategy={"validation_type": "custom"}
then cv
parameter is used for validation. The cv
should have a list of tuples. Each tuple define train and validation indices.
For the custom validation, the stacking and boost-on-errors steps are by default turned OFF. Should be enabled by the user only.
import numpy as np
import pandas as pd
from supervised.automl import AutoML
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
from sklearn import datasets
X, y = datasets.make_classification(
n_samples=100,
n_features=5,
n_informative=4,
n_redundant=1,
n_classes=2,
n_clusters_per_class=3,
n_repeated=0,
shuffle=False,
random_state=0,
)
X = pd.DataFrame(X)
y = pd.Series(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train.reset_index(inplace=True, drop=True)
y_train.reset_index(inplace=True, drop=True)
folds = pd.Series(X_train.index % 4)
splits = [
([0], [1,2,3]),
([0,1], [2,3]),
([0,1,2], [3]),
]
# define train and validation indices
cv = []
for split in splits:
train_indices = X_train.index[folds.isin(split[0])]
validation_indices = X_train.index[folds.isin(split[1])]
cv += [(train_indices, validation_indices)]
automl = AutoML(
mode="Compete",
algorithms=["Xgboost"],
eval_metric="accuracy",
start_random_models=1,
validation_strategy={
"validation_type": "custom"
}
)
automl.fit(X_train, y_train, cv=cv)
You will find more examples in this discussion https://github.com/mljar/mljar-supervised/issues/401
@cibic89 please let me know if it works for you? BTW, what type of validation are you going to use?
Thank you for your reply.
I have separate train and validation and I cannot merge these as one to use “straight” cross-validation.
From: Piotr @.> Reply to: mljar/mljar-supervised @.> Date: Thursday, 25 November 2021 at 12:11 To: mljar/mljar-supervised @.> Cc: George Joseph @.>, Mention @.***> Subject: Re: [mljar/mljar-supervised] Custom validation/test set - turn off cross-validation (CV) (Issue #491)
There is an additional cv argument in the fit(). If validation is set to custom validation_strategy={"validation_type": "custom"} then cv parameter is used for validation. The cv should have a list of tuples. Each tuple define train and validation indices.
For the custom validation, the stacking and boost-on-errors steps are by default turned OFF. Should be enabled by the user only.
import numpy as np
import pandas as pd
from supervised.automl import AutoML
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
from sklearn import datasets
X, y = datasets.make_classification(
n_samples=100,
n_features=5,
n_informative=4,
n_redundant=1,
n_classes=2,
n_clusters_per_class=3,
n_repeated=0,
shuffle=False,
random_state=0,
)
X = pd.DataFrame(X)
y = pd.Series(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train.reset_index(inplace=True, drop=True)
y_train.reset_index(inplace=True, drop=True)
folds = pd.Series(X_train.index % 4)
splits = [
([0], [1,2,3]),
([0,1], [2,3]),
([0,1,2], [3]),
]
define train and validation indices
cv = []
for split in splits:
train_indices = X_train.index[folds.isin(split[0])]
validation_indices = X_train.index[folds.isin(split[1])]
cv += [(train_indices, validation_indices)]
automl = AutoML(
mode="Compete",
algorithms=["Xgboost"],
eval_metric="accuracy",
start_random_models=1,
validation_strategy={
"validation_type": "custom"
}
)
automl.fit(X_train, y_train, cv=cv)
You will find more examples in this discussion #401https://github.com/mljar/mljar-supervised/issues/401
@cibic89https://github.com/cibic89 please let me know if it works for you? BTW, what type of validation are you going to use?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/491#issuecomment-979154428, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFM4L5Y5C4ZYQGCQ3C7NDJTUNYRX5ANCNFSM5IYH4FMA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
It is similar to this one https://github.com/mljar/mljar-supervised/issues/401#issuecomment-852909390 - it should work.