auto-sklearn
auto-sklearn copied to clipboard
Deadlock
Hi everybody,
lately, I run into deadlocks with AutoSklearn where it just does not end running. When I cancel it, I get the following exception. It looks like it is stuck in waiting:
Traceback (most recent call last):
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/util/logging_.py", line 317, in start_log_server
receiver.serve_until_stopped()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/util/logging_.py", line 347, in serve_until_stopped
rd, wr, ex = select.select([self.socket.fileno()], [], [], self.timeout)
KeyboardInterrupt
^C^CTraceback (most recent call last):
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 899, in fit
) = _proc_smac.run_smbo()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/smbo.py", line 552, in run_smbo
smac.optimize()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/smac/facade/smac_ac_facade.py", line 720, in optimize
incumbent = self.solver.run()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 316, in run
self._incorporate_run_results(run_info, result, time_left)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 552, in _incorporate_run_results
smbo=self, run_info=run_info, result=result, time_left=time_left
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/manager.py", line 170, in __call__
self.build_ensemble(smbo.tae_runner.client)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/manager.py", line 251, in build_ensemble
logger_port=self.logger_port,
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/util/single_thread_client.py", line 89, in submit
return DummyFuture(func(*args, **kwargs))
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/manager.py", line 391, in fit_and_return_ensemble
pynisher_context=pynisher_context,
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/builder.py", line 333, in run
safe_ensemble_script(time_left, iteration)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/pynisher/limit_function_call.py", line 305, in __call__
subproc.join()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/neutatz/Software/GreenAutoML/fastsklearnfeature/declarative_automl/optuna_package/myautoml/analysis/parallel_autosklearn2_new/check_model_parallel.py", line 177, in <module>
automl.fit(X_train_sample.copy(), y_train_sample.copy(), feat_type=feat_type)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/estimators.py", line 1454, in fit
dataset_name=dataset_name,
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/estimators.py", line 540, in fit
self.automl_.fit(load_models=self.load_models, **kwargs)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 2313, in fit
is_classification=True,
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 964, in fit
self._fit_cleanup()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 1069, in _fit_cleanup
self._clean_logger()
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 421, in _clean_logger
self.logging_server.join(timeout=5)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/popen_fork.py", line 45, in wait
if not wait([self.sentinel], timeout):
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/connection.py", line 921, in wait
ready = selector.select(timeout)
File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/threading.py'>
Traceback (most recent call last)
Do you have any idea what I do wrong?
Thank you for your help.
Best regards, Felix
Heyo, my guess is dask and a recent change they made. You could try downgrading dask to 2023.3.2? It;s just a hunch, but I imagine the problem is because of how auto-sklearn does logging for multi-processing and how dask no longer allows passing system resources to workers.
- https://github.com/dask/distributed/issues/7792