veridical-flow
veridical-flow copied to clipboard
Error when calling `fit_transform` for `Vset` with `is_async=True`
In the example below, when using a Vset with is_async=True, the transform method expects to get a ray.objectRef and call ray.get on it, but instead gets an array:
from vflow import build_vset, init_args
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.utils import resample
import ray
ray.init(num_cpus=4)
X, y = make_regression(n_samples=1000, n_features=100, n_informative=1)
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval)
X_train, y_train = init_args([X_train, y_train], names=['X_train', 'y_train'])
X_val, y_val = init_args([X_val, y_val], names=['X_val', 'y_val'])
# create a Vset for bootstrapping from data 10 times
# we use lazy=True so that the data will not be resampled until needed
boot_set = build_vset('boot', resample, reps=10, lazy=True)
# bootstrap from training data by calling boot_fun
X_trains, y_trains = boot_set(X_train, y_train)
# hyperparameters to try
pca_params = {
'n_components': [10, 20, 50],
'svd_solver': ['randomized', 'full', 'auto']
}
# we could instead pass a list of distinct models and corresponding param dicts
pca_set = build_vset('PCA', PCA, pca_params, is_async=True)
X_trains_pca = pca_set.fit_transform(X_trains)
TypeError: Attempting to call `get` on the value [[-0.73763296 -1.64044139 -0.74793088 ... -0.1085027 -0.25652127
0.11583096]
...
See https://github.com/Yu-Group/veridical-flow/issues/50 for a possible workaround until this is fixed.