auto-sklearn
auto-sklearn copied to clipboard
Directory Not Empty Error Notebook in JupyterLab enviornment (Docker)
Describe the bug
It appears auto-sklearn want's to delete the python temp dir versus using the provided directories.
Code Snippet
automl = regression.AutoSklearnRegressor(tmp_folder='/users/jihh/automl/auto-sklearn/temp_housing/',output_folder='/users/jihh/automl/auto-sklearn/out_housing/',delete_tmp_folder_after_terminate=False)
automl.fit(X_train, y_train)
Error:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-11-7a52a5d8533a> in <module>
----> 1 automl.fit(X_train, y_train)
~/.local/lib/python3.8/site-packages/autosklearn/estimators.py in fit(self, X, y, X_test, y_test, feat_type, dataset_name)
719 # Fit is supposed to be idempotent!
720 # But not if we use share_mode.
--> 721 super().fit(
722 X=X,
723 y=y,
~/.local/lib/python3.8/site-packages/autosklearn/estimators.py in fit(self, **kwargs)
346 output_folder=self.output_folder,
347 )
--> 348 self.automl_.fit(load_models=self._load_models, **kwargs)
349
350 return self
~/.local/lib/python3.8/site-packages/autosklearn/automl.py in fit(self, X, y, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models)
1264 self._metric = r2
1265
-> 1266 return super().fit(
1267 X, y,
1268 X_test=X_test,
~/.local/lib/python3.8/site-packages/autosklearn/automl.py in fit(self, X, y, task, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models)
531 # == Perform dummy predictions
532 num_run = 1
--> 533 self._do_dummy_prediction(datamanager, num_run)
534
535 # = Create a searchspace
~/.local/lib/python3.8/site-packages/autosklearn/automl.py in _do_dummy_prediction(self, datamanager, num_run)
315 **self._resampling_strategy_arguments)
316
--> 317 status, cost, runtime, additional_info = ta.run(num_run, cutoff=self._time_for_task)
318 if status == StatusType.SUCCESS:
319 self._logger.info("Finished creating dummy predictions.")
~/.local/lib/python3.8/site-packages/autosklearn/evaluation/__init__.py in run(self, config, instance, cutoff, seed, budget, instance_specific)
274
275 obj = pynisher.enforce_limits(**arguments)(self.ta)
--> 276 obj(**obj_kwargs)
277
278 if obj.exit_status in (pynisher.TimeoutException,
~/.local/lib/python3.8/site-packages/pynisher/limit_function_call.py in __call__(self2, *args, **kwargs)
279 self2.stderr = fh.read()
280
--> 281 tmp_dir.cleanup()
282
283 # don't leave zombies behind
/opt/conda/lib/python3.8/tempfile.py in cleanup(self)
829 def cleanup(self):
830 if self._finalizer.detach():
--> 831 self._rmtree(self.name)
/opt/conda/lib/python3.8/tempfile.py in _rmtree(cls, name)
811 raise
812
--> 813 _shutil.rmtree(name, onerror=onerror)
814
815 @classmethod
/opt/conda/lib/python3.8/shutil.py in rmtree(path, ignore_errors, onerror)
717 os.rmdir(path)
718 except OSError:
--> 719 onerror(os.rmdir, path, sys.exc_info())
720 else:
721 try:
/opt/conda/lib/python3.8/shutil.py in rmtree(path, ignore_errors, onerror)
715 _rmtree_safe_fd(fd, path, onerror)
716 try:
--> 717 os.rmdir(path)
718 except OSError:
719 onerror(os.rmdir, path, sys.exc_info())
OSError: [Errno 39] Directory not empty: '/data/shared/tmp/tmpg00q7u62'
**Contents of /data/shared/tmp/tmpg00q7u62: **
jihh@:auto-sklearn$> ls -al /data/shared/tmp/tmpg00q7u62
total 1604
drwx------ 2 jihh mlp-discovery-users 0 Nov 13 04:48 .
drwxrwx--T 4135 nobody mlp-discovery-users 212765 Nov 13 04:48 ..
Contents of /users/jihh/automl/auto-sklearn/temp_housing/:
jihh@:auto-sklearn$> ls -al /users/jihh/automl/auto-sklearn/temp_housing/
total 88
drwxr-xr-x 3 jihh mlp-discovery-users 95 Nov 13 04:48 .
drwxr-xr-x 5 jihh mlp-discovery-users 179 Nov 13 04:49 ..
-rw-r--r-- 1 jihh mlp-discovery-users 15546 Nov 13 04:48 'AutoML(1):8ae1121ed217904c992ab3815468796a.log'
drwxr-xr-x 3 jihh mlp-discovery-users 128 Nov 13 04:48 .auto-sklearn
To Reproduce
Running the notebook in a jupyterlab environment.
Expected behavior
Expect it wouldn't try to manage directories that it doesn't need to.
Actual behavior, stacktrace or logfile
Environment and installation:
Please give details about your installation:
Jupyterlab running a version of the DataScience Notebook image. See auto_sklearn.log for version information.
Thanks a lot @CrosbyMonk for reporting this issue. It appears that the directory that is tried to be deleted is a directory created by the pynisher for storing the output of a subprocess. Therefore, it is to be expected that Auto-sklearn tries to delete it.
However, there's now the question why the temporary directory is not empty? Are you still able to see the content of that directory and the files in there? Maybe the cleanup and the join need to be switched (https://github.com/automl/pynisher/blob/master/pynisher/limit_function_call.py#L281)?
The directory was empty from the time the process died. See the above output from doing an ls on /data/shared/tmp/tmpg00q7u62.
Hey, are you able to constantly reproduce this or did this only happen a single time?
Apparently missed your comment. 100% reproducible for me. Data science notebook base is jupyter/datascience-notebook:7e07b801d92b with the following additional packages installed.
docker
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
less \
apt-transport-https \
apt-utils \
build-essential \
curl \
freeglut3-dev \
gdebi-core \
git \
graphviz \
krb5-config \
krb5-user \
libclang-dev \
libcurl4-openssl-dev \
libedit2 \
libnlopt-dev \
libsasl2-dev \
libsasl2-modules-gssapi-mit \
libspatialindex-dev \
libkrb5-dev \
libssl1.1 \
libssl-dev \
libxml2-dev \
netcat \
net-tools \
openssh-server \
psmisc \
rsync \
sf-dpl \
vim \
tesseract-ocr-all \
xvfb \
&& apt upgrade -y \
&& apt-get autoclean \
&& apt-get clean \
&& apt-get autoremove -y
and
docker
RUN python3 -m pip --no-cache-dir install --upgrade \
bs4 \
cloudpickle \
configparser \
cython \
flask \
graphviz \
impyla \
ipywidgets \
kerberos \
matplotlib \
numpy \
pandas \
pandasql \
pytest \
sasl \
scikit-learn \
scipy \
setuptools \
thrift \
thrift_sasl==0.2.1
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.