pangeo-example-notebooks
pangeo-example-notebooks copied to clipboard
machine-learning.ipynb on http://pangeo.pydata.org RuntimeError
Tried to run the the cell
from sklearn.externals import joblib
with joblib.parallel_backend('dask', scatter=[X, y]):
grid_search.fit(X, y)
and got the output (it's long...)
Possibly the RuntimeError: Joblib backend requires either joblib>= '0.10.2' orsklearn > '0.17.1'. Please install or upgrade is the main issue?
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-8-085d2322fa37> in <module>()
2
3 with joblib.parallel_backend('dask', scatter=[X, y]):
----> 4 grid_search.fit(X, y)
/opt/conda/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
637 error_score=self.error_score)
638 for parameters, (train, test) in product(candidate_params,
--> 639 cv.split(X, y, groups)))
640
641 # if one choose to see train score, "out" will contain train score info
/opt/conda/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
/opt/conda/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in retrieve(self)
699 self._output.extend(job.get(timeout=self.timeout))
700 else:
--> 701 self._output.extend(job.get())
702
703 except BaseException as exception:
/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in get()
249
250 def get():
--> 251 return ref().result()
252
253 future.get = get # monkey patch to achieve AsyncResult API
/opt/conda/lib/python3.6/site-packages/distributed/client.py in result(self, timeout)
190 raiseit=False)
191 if self.status == 'error':
--> 192 six.reraise(*result)
193 elif self.status == 'cancelled':
194 raise result
/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
690 value = tp()
691 if value.__traceback__ is not tb:
--> 692 raise value.with_traceback(tb)
693 raise value
694 finally:
/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads()
57 def loads(x):
58 try:
---> 59 return pickle.loads(x)
60 except Exception:
61 logger.info("Failed to deserialize %s", x[:10000], exc_info=True)
/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in <module>()
38 _bases.append(ParallelBackendBase)
39 if not _bases:
---> 40 raise RuntimeError("Joblib backend requires either `joblib` >= '0.10.2' "
41 " or `sklearn` > '0.17.1'. Please install or upgrade")
42
RuntimeError: Joblib backend requires either `joblib` >= '0.10.2' or `sklearn` > '0.17.1'. Please install or upgrade
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d45c7b8>, <Future finished exception=CancelledError(['_fit_and_score-batch-c8bc3da59762435bb023dded3c77fb1c'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c8bc3da59762435bb023dded3c77fb1c']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d459f28>, <Future finished exception=CancelledError(['_fit_and_score-batch-c4ce3d7618034bec8f259a15b9b99b3f'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c4ce3d7618034bec8f259a15b9b99b3f']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6e527620>, <Future finished exception=CancelledError(['_fit_and_score-batch-4ca1e7b762c44a0d930e15f6c6a981f9'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-4ca1e7b762c44a0d930e15f6c6a981f9']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6edb52f0>, <Future finished exception=CancelledError(['_fit_and_score-batch-29b5dd78588d448a8eb6e33d0d7400ca'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-29b5dd78588d448a8eb6e33d0d7400ca']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6fddf950>, <Future finished exception=CancelledError(['_fit_and_score-batch-c0c51b4512904a449c9cd169b95b749e'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c0c51b4512904a449c9cd169b95b749e']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6edb11e0>, <Future finished exception=CancelledError(['_fit_and_score-batch-50ac41eee8364dcbb7b42e46ef9b0912'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-50ac41eee8364dcbb7b42e46ef9b0912']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6ed93378>, <Future finished exception=CancelledError(['_fit_and_score-batch-c20e4a9fc8654ae290286dbe6fab8c14'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c20e4a9fc8654ae290286dbe6fab8c14']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d45e048>, <Future finished exception=CancelledError(['_fit_and_score-batch-eea80eb9ac67456abbc3f6ab66742105'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-eea80eb9ac67456abbc3f6ab66742105']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6e527e18>, <Future finished exception=CancelledError(['_fit_and_score-batch-f9de1c20b4034245968ae293f0296956'],)>)
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
ret = callback()
File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
result = yield _wait([future])
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
yielded = self.gen.send(value)
File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-f9de1c20b4034245968ae293f0296956']
@TomAugspurger, does this mean anything to you, perhaps a joblib/sklearn release schedule thing?
Did you import dask_ml.joblib, or import distributed.joblib first?
The imports (in order) throughout the notebook are:
from dask_kubernetes import KubeCluster
from dask.distributed import Client, progress
import dask_ml.joblib # register the distriubted backend
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import pandas as pd
from sklearn.externals import joblib
Thanks, that would have raised a different error anyway.
Will take a look later.
Seeing same issue in example notebooks
Hopefully fixed by https://github.com/pangeo-data/helm-chart/pull/51
You could maybe work around it by adding dask-ml to the worker-template.yaml, something like
env:
- name: EXTRA_CONDA_PACKAGES
value: dask-ml
for now, but that isn't a long-term solution.
This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge.
Strange, as the worker dockerfile doesn't include dask-ml: https://github.com/rsignell-usgs/helm-chart/blob/94ca64191b9e4ab12ba455852c2ed85a915cd51b/docker-images/worker/Dockerfile
My diagnosis may be incorrect then.
On Mon, Jul 23, 2018 at 4:05 PM Rich Signell [email protected] wrote:
This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge https://github.com/rsignell-usgs/helm-chart/blob/conda-forge/docker-images/notebook/Dockerfile .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-example-notebooks/issues/1#issuecomment-407200884, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIhj07ZcE9JbmSDwWxFOZX7McXtV_ks5uJjqEgaJpZM4U29xb .
Ah, of course my diagnosis is incorrect, since the example doesn't actually require dask-ml, just scikit-learn and distributed.
I'll do some further debugging...
@TomAugspurger, we actually are using the notebook image for the workers too, so that old worker Dockerfile is misleading. The notebook environment contains dask-ml, which is required by the example notebook.