auto-sklearn icon indicating copy to clipboard operation
auto-sklearn copied to clipboard

Deadlock

Open FelixNeutatz opened this issue 2 years ago • 1 comments

Hi everybody,

lately, I run into deadlocks with AutoSklearn where it just does not end running. When I cancel it, I get the following exception. It looks like it is stuck in waiting:

Traceback (most recent call last):                                                                                                                                                                          
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()                                                                                                                                                                                              
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)                                                                                                                                                               
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/util/logging_.py", line 317, in start_log_server
    receiver.serve_until_stopped()                                                                                                                                                                          
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/util/logging_.py", line 347, in serve_until_stopped
    rd, wr, ex = select.select([self.socket.fileno()], [], [], self.timeout)                                                                                                                                
KeyboardInterrupt                         
^C^CTraceback (most recent call last):                                                                                                                                                                      
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 899, in fit
    ) = _proc_smac.run_smbo()                                                                         
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/smbo.py", line 552, in run_smbo
    smac.optimize()
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/smac/facade/smac_ac_facade.py", line 720, in optimize
    incumbent = self.solver.run() 
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 316, in run
    self._incorporate_run_results(run_info, result, time_left)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 552, in _incorporate_run_results
    smbo=self, run_info=run_info, result=result, time_left=time_left
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/manager.py", line 170, in __call__
    self.build_ensemble(smbo.tae_runner.client)                                                                                                                                                             
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/manager.py", line 251, in build_ensemble
    logger_port=self.logger_port,
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/util/single_thread_client.py", line 89, in submit
    return DummyFuture(func(*args, **kwargs))
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/manager.py", line 391, in fit_and_return_ensemble
    pynisher_context=pynisher_context,
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/ensemble_building/builder.py", line 333, in run
    safe_ensemble_script(time_left, iteration)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/pynisher/limit_function_call.py", line 305, in __call__
    subproc.join()
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 140, in join
    res = self._popen.wait(timeout)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

During handling of the above exception, another exception occurred:


Traceback (most recent call last):
  File "/home/neutatz/Software/GreenAutoML/fastsklearnfeature/declarative_automl/optuna_package/myautoml/analysis/parallel_autosklearn2_new/check_model_parallel.py", line 177, in <module>
    automl.fit(X_train_sample.copy(), y_train_sample.copy(), feat_type=feat_type)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/estimators.py", line 1454, in fit
    dataset_name=dataset_name,
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/estimators.py", line 540, in fit
    self.automl_.fit(load_models=self.load_models, **kwargs)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 2313, in fit
    is_classification=True,
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 964, in fit
    self._fit_cleanup()
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 1069, in _fit_cleanup
    self._clean_logger()
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/site-packages/autosklearn/automl.py", line 421, in _clean_logger
    self.logging_server.join(timeout=5)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/process.py", line 140, in join
    res = self._popen.wait(timeout)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/popen_fork.py", line 45, in wait
    if not wait([self.sentinel], timeout):
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/multiprocessing/connection.py", line 921, in wait
    ready = selector.select(timeout)
  File "/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/home/neutatz/anaconda3/envs/autosklearn/lib/python3.7/threading.py'>
Traceback (most recent call last)

Do you have any idea what I do wrong?

Thank you for your help.

Best regards, Felix

FelixNeutatz avatar May 02 '23 13:05 FelixNeutatz

Heyo, my guess is dask and a recent change they made. You could try downgrading dask to 2023.3.2? It;s just a hunch, but I imagine the problem is because of how auto-sklearn does logging for multi-processing and how dask no longer allows passing system resources to workers.

  • https://github.com/dask/distributed/issues/7792

eddiebergman avatar May 02 '23 19:05 eddiebergman