tune-sklearn icon indicating copy to clipboard operation
tune-sklearn copied to clipboard

[BUG] ray:IDLE processes persist even after client.disconnect()

Open nopanderer opened this issue 2 years ago • 6 comments

  • Ray Version: 1.9.2
  • tune-sklearn version: 0.4.1

I tried to run the example code below using ray cluster.

import ray
# from sklearn.model_selection import GridSearchCV
from tune_sklearn import TuneGridSearchCV

# Other imports
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier

client = ray.init("ray://MYRAYCLUSTER")

# Set training and validation sets
X, y = make_classification(n_samples=11000, n_features=1000, n_informative=50, n_redundant=0, n_classes=10, class_sep=2.5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1000)

# Example parameters to tune from SGDClassifier
parameters = {
    'alpha': [1e-4, 1e-1, 1],
    'epsilon':[0.01, 0.1]
}

tune_search = TuneGridSearchCV(
    SGDClassifier(),
    parameters,
    early_stopping="MedianStoppingRule",
    max_iters=10
)

import time # Just to compare fit times
start = time.time()
tune_search.fit(X_train, y_train)
end = time.time()
print("Tune Fit Time:", end - start)
pred = tune_search.predict(X_test)
accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)
print("Tune Accuracy:", accuracy)

client.disconnect()

Even after I disconnected the client, there are ray:IDLE processes in the ray head node. I tried other examples the Ray Core and Ray Tune and this issue not happened.

nopanderer avatar Mar 08 '22 02:03 nopanderer

Can you update Ray to the latest version and try again?

Yard1 avatar Mar 08 '22 16:03 Yard1

Thanks, Yard1. I upgraded Ray to 1.10.0 and tried again, but still happens. When I run ray memory, the processes below persist.

IP_ADDRESS | PID | Worker | (deserialize task arg) ray.tune.tune.run | xxxxxxxx.x B | LOCAL_REFERENCE | OBJECT_REF
IP_ADDRESS | PID | Worker | (deserialize task arg) ray.tune.tune.run | xxxxxxxx.x B | LOCAL_REFERENCE | OBJECT_REF
IP_ADDRESS | PID | Worker | (deserialize task arg) ray.tune.tune.run | xxxxxxxx.x B | LOCAL_REFERENCE | OBJECT_REF
...

nopanderer avatar Mar 10 '22 02:03 nopanderer

Ok, I'll take a look. Thanks!

Yard1 avatar Mar 10 '22 18:03 Yard1

Hey @nopanderer this should be fixed in https://github.com/ray-project/tune-sklearn/releases/tag/v0.4.2, please let me know if the problem persists after update or not.

Yard1 avatar Apr 04 '22 18:04 Yard1

@Yard1 I'll check it out. Thanks a lot!

nopanderer avatar Apr 05 '22 09:04 nopanderer

Got the same issue on v0.4.3. After running TuneGridSearchCV head has multiple processes with IDLE status. Is there any option to kill these processes from Python?

skabbit avatar Aug 23 '22 12:08 skabbit