mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

Custom validation/test set - turn off cross-validation (CV)

Open cibic89 opened this issue 3 years ago • 3 comments

As title suggests. How do I do it please?

cibic89 avatar Nov 25 '21 11:11 cibic89

There is an additional cv argument in the fit(). If validation is set to custom validation_strategy={"validation_type": "custom"} then cv parameter is used for validation. The cv should have a list of tuples. Each tuple define train and validation indices.

For the custom validation, the stacking and boost-on-errors steps are by default turned OFF. Should be enabled by the user only.

import numpy as np
import pandas as pd
from supervised.automl import AutoML
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

from sklearn import datasets

X, y = datasets.make_classification(
    n_samples=100,
    n_features=5,
    n_informative=4,
    n_redundant=1,
    n_classes=2,
    n_clusters_per_class=3,
    n_repeated=0,
    shuffle=False,
    random_state=0,
)

X = pd.DataFrame(X)
y = pd.Series(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


X_train.reset_index(inplace=True, drop=True)
y_train.reset_index(inplace=True, drop=True)

folds = pd.Series(X_train.index % 4)

splits = [
    ([0], [1,2,3]),
    ([0,1], [2,3]),
    ([0,1,2], [3]),
]

# define train and validation indices
cv = []
for split in splits:
    train_indices = X_train.index[folds.isin(split[0])]
    validation_indices = X_train.index[folds.isin(split[1])]
    cv += [(train_indices, validation_indices)]


automl = AutoML(

    mode="Compete",
    algorithms=["Xgboost"], 
    eval_metric="accuracy",
    start_random_models=1,
    validation_strategy={
        "validation_type": "custom"
    }
)
automl.fit(X_train, y_train, cv=cv)

You will find more examples in this discussion https://github.com/mljar/mljar-supervised/issues/401

@cibic89 please let me know if it works for you? BTW, what type of validation are you going to use?

pplonski avatar Nov 25 '21 12:11 pplonski

Thank you for your reply.

I have separate train and validation and I cannot merge these as one to use “straight” cross-validation.

From: Piotr @.> Reply to: mljar/mljar-supervised @.> Date: Thursday, 25 November 2021 at 12:11 To: mljar/mljar-supervised @.> Cc: George Joseph @.>, Mention @.***> Subject: Re: [mljar/mljar-supervised] Custom validation/test set - turn off cross-validation (CV) (Issue #491)

There is an additional cv argument in the fit(). If validation is set to custom validation_strategy={"validation_type": "custom"} then cv parameter is used for validation. The cv should have a list of tuples. Each tuple define train and validation indices.

For the custom validation, the stacking and boost-on-errors steps are by default turned OFF. Should be enabled by the user only.

import numpy as np

import pandas as pd

from supervised.automl import AutoML

from sklearn.model_selection import train_test_split

from sklearn.metrics import log_loss

from sklearn import datasets

X, y = datasets.make_classification(

n_samples=100,

n_features=5,

n_informative=4,

n_redundant=1,

n_classes=2,

n_clusters_per_class=3,

n_repeated=0,

shuffle=False,

random_state=0,

)

X = pd.DataFrame(X)

y = pd.Series(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

X_train.reset_index(inplace=True, drop=True)

y_train.reset_index(inplace=True, drop=True)

folds = pd.Series(X_train.index % 4)

splits = [

([0], [1,2,3]),

([0,1], [2,3]),

([0,1,2], [3]),

]

define train and validation indices

cv = []

for split in splits:

train_indices = X_train.index[folds.isin(split[0])]

validation_indices = X_train.index[folds.isin(split[1])]

cv += [(train_indices, validation_indices)]

automl = AutoML(

mode="Compete",

algorithms=["Xgboost"],

eval_metric="accuracy",

start_random_models=1,

validation_strategy={

    "validation_type": "custom"

}

)

automl.fit(X_train, y_train, cv=cv)

You will find more examples in this discussion #401https://github.com/mljar/mljar-supervised/issues/401

@cibic89https://github.com/cibic89 please let me know if it works for you? BTW, what type of validation are you going to use?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mljar/mljar-supervised/issues/491#issuecomment-979154428, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFM4L5Y5C4ZYQGCQ3C7NDJTUNYRX5ANCNFSM5IYH4FMA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

cibic89 avatar Nov 25 '21 12:11 cibic89

It is similar to this one https://github.com/mljar/mljar-supervised/issues/401#issuecomment-852909390 - it should work.

pplonski avatar Nov 25 '21 13:11 pplonski