optunity
optunity copied to clipboard
Problem using PredefinedSplit as a fold generator
Hi and thank for the great work with optunity,
I am trying to work with optunity.cross_validated passing my predefined splits generated with sklearn's PredefinedSplit. After seeing the example with StratifiedKFold I expected PredefinedSplit to work too. Nonethelss, I'm guessing the problem is not the fold generator but the number of folds generated, where I only generate 1 test fold.
This is my current data setting
In [116]: X.shape
Out[116]: (11342, 16955)
In [117]: y.shape
Out[117]: (11342,)
In [115]: for train, test in cv.split():
...: print(train)
...: print(test)
...:
[ 0 1 2 ..., 5706 5707 5708]
[ 5709 5710 5711 ..., 11339 11340 11341]
folds = [[list(test) for train, test in cv.split()]]
Then, I decorate my cost function as follows
@optunity.cross_validated(x=data, y=y, folds=folds, num_folds=len(folds[0]))
def lasso_cost(x_train, y_train, x_test, y_test, alpha):
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
model = Lasso(alpha=alpha).fit(x_train, y_train)
return optunity.metrics.absolute_error(y_test, model.predict(x_test))
The prints are for debugging purposes. The problem is that optunity throws this error:
In [120]: lasso_cost(0.1)
(0, 16955) (0,)
(5633, 16955) (5633,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-120-06e8ef0f0b50> in <module>()
----> 1 lasso_cost(0.1)
/scratch/gaa/local/src/anaconda2/envs/alecatpy35/lib/python3.5/site-packages/optunity/cross_validation.py in __call__(self, *args, **kwargs)
401 kwargs['y_train'] = select(self.y, rows_train)
402 kwargs['y_test'] = select(self.y, rows_test)
--> 403 scores.append(self.f(**kwargs))
404 return self.reduce(scores)
405
<ipython-input-119-edce58ac4ce5> in lasso_cost(x_train, y_train, x_test, y_test, alpha)
3 print(x_train.shape, y_train.shape)
4 print(x_test.shape, y_test.shape)
----> 5 model = Lasso(alpha=alpha).fit(x_train, y_train)
6 return optunity.metrics.absolute_error(y_test, model.predict(x_test))
/scratch/gaa/local/src/anaconda2/envs/alecatpy35/lib/python3.5/site-packages/sklearn/linear_model/coordinate_descent.py in fit(self, X, y, check_input)
675 order='F', dtype=[np.float64, np.float32],
676 copy=self.copy_X and self.fit_intercept,
--> 677 multi_output=True, y_numeric=True)
678 y = check_array(y, order='F', copy=False, dtype=X.dtype.type,
679 ensure_2d=False)
/scratch/gaa/local/src/anaconda2/envs/alecatpy35/lib/python3.5/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
519 X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
520 ensure_2d, allow_nd, ensure_min_samples,
--> 521 ensure_min_features, warn_on_dtype, estimator)
522 if multi_output:
523 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
/scratch/gaa/local/src/anaconda2/envs/alecatpy35/lib/python3.5/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
414 " minimum of %d is required%s."
415 % (n_samples, shape_repr, ensure_min_samples,
--> 416 context))
417
418 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0, 16955)) while a minimum of 1 is required.
From what I understand, it's getting right the test indices but leaving the train test with 0 samples. Any pointers on why this may be happening? Shouldn't it assign the missing indices to the train set? Needless to say, this works well with sklearn's LassoCV model, where I pass the iterable cv to the model.
I have tested different settings where I only generate 1 test fold and this error occurs in all of them, doesn't optunity just support the case of 1 fold?
Thank you for your attention.