veridical-flow icon indicating copy to clipboard operation
veridical-flow copied to clipboard

Error when calling `fit_transform` for `Vset` with `is_async=True`

Open jpdunc23 opened this issue 3 years ago • 0 comments

In the example below, when using a Vset with is_async=True, the transform method expects to get a ray.objectRef and call ray.get on it, but instead gets an array:

from vflow import build_vset, init_args

import numpy as np

from sklearn.decomposition import PCA
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.utils import resample

import ray

ray.init(num_cpus=4)

X, y = make_regression(n_samples=1000, n_features=100, n_informative=1)

X_trainval, X_test, y_trainval, y_test = train_test_split(X, y)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval)

X_train, y_train = init_args([X_train, y_train], names=['X_train', 'y_train'])
X_val, y_val = init_args([X_val, y_val], names=['X_val', 'y_val'])

# create a Vset for bootstrapping from data 10 times
# we use lazy=True so that the data will not be resampled until needed
boot_set = build_vset('boot', resample, reps=10, lazy=True)

# bootstrap from training data by calling boot_fun
X_trains, y_trains = boot_set(X_train, y_train)

# hyperparameters to try
pca_params = {
    'n_components': [10, 20, 50],
    'svd_solver': ['randomized', 'full', 'auto']
}

# we could instead pass a list of distinct models and corresponding param dicts
pca_set = build_vset('PCA', PCA, pca_params, is_async=True)

X_trains_pca = pca_set.fit_transform(X_trains)
TypeError: Attempting to call `get` on the value [[-0.73763296 -1.64044139 -0.74793088 ... -0.1085027  -0.25652127
   0.11583096]
...

See https://github.com/Yu-Group/veridical-flow/issues/50 for a possible workaround until this is fixed.

jpdunc23 avatar Nov 15 '22 21:11 jpdunc23