pertpy Synchronous dataloader write fails

Synchronous dataloader write fails

Open yugeji opened this issue 1 year ago • 1 comments

Report

python==3.9.0 install pertpy install snakemake

Within the Snakefile:

rule prepare_data:
        output: TMPDIR / 'prepared_{dataset}.h5ad'
        resources:
                time='8:00:00',
                mem_mb=64000,
                disk_mb=64000
        run:
                # IMPORTS HERE
                import os
                os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"
                import pertpy as pt
                dataset = wildcards.dataset

                if dataset in ['sciplex_K562', 'sciplex_A549', 'sciplex_MCF7']:
                        cell_line = dataset.split('_')[1]
                        adata = pt.data.srivatsan_2020_sciplex3()

Because all three dataset values were run at the same time, pt.data.srivatsan_2020_sciplex3() was run in three different threads. Since the file was not pre-downloaded, all threads began downloading, causing a lock to be called on the file, preventing any thread from completing the download. Including the os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE" line does not fix this.

Version information

pertpy 0.5.0 session_info 1.0.0

IPython 8.16.1 PIL 10.1.0 absl NA adjustText 0.8 aiohttp 3.8.6 aiosignal 1.3.1 anndata 0.10.2 annotated_types 0.6.0 anyio NA arviz 0.16.1 asttokens NA async_timeout 4.0.3 attr 23.1.0 backcall 0.2.0 backoff 2.2.1 brotli 1.1.0 bs4 4.12.2 certifi 2023.07.22 cffi 1.16.0 charset_normalizer 3.3.1 chex 0.1.7 click 8.1.7 colorama 0.4.6 comm 0.1.4 contextlib2 NA croniter NA cycler 0.12.1 cython_runtime NA dateutil 2.8.2 decorator 5.1.1 decoupler 1.5.0 deepdiff 6.6.1 defusedxml 0.7.1 docrep 0.3.2 etils 1.5.1 exceptiongroup 1.1.3 executing 2.0.0 fastapi 0.104.0 flax 0.7.4 frozenlist 1.4.0 fsspec 2023.10.0 google NA h5py 3.10.0 idna 3.4 igraph 0.10.8 importlib_metadata NA importlib_resources NA ipywidgets 8.1.1 jax 0.4.19 jaxlib 0.4.19 jaxopt NA jedi 0.19.1 joblib 1.3.2 kiwisolver 1.4.5 leidenalg 0.10.1 lightning 2.0.9.post0 lightning_cloud 0.5.43 lightning_fabric 2.1.0 lightning_utilities 0.9.0 llvmlite 0.40.1 matplotlib 3.8.0 mizani 0.9.3 ml_collections NA ml_dtypes 0.3.1 mpl_toolkits NA mpmath 1.3.0 msgpack 1.0.7 mudata 0.2.3 multidict 6.0.4 multipart 0.0.6 multipledispatch 0.6.0 natsort 8.4.0 numba 0.57.1 numpy 1.24.0 numpyro 0.13.2 opt_einsum v3.3.0 optax 0.1.7 ordered_set 4.1.0 ott 0.4.4 packaging 23.2 pandas 2.1.1 parso 0.8.3 patsy 0.5.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA plotnine 0.12.3 ply 3.11 prompt_toolkit 3.0.39 psutil 5.9.5 ptyprocess 0.7.0 pure_eval 0.2.2 pycparser 2.21 pydantic 2.1.1 pydantic_core 2.4.0 pygments 2.16.1 pyomo 6.6.2 pyparsing 3.1.1 pyro 1.8.6 pytorch_lightning 2.1.0 pytz 2023.3.post1 requests 2.31.0 rich NA scanpy 1.9.5 scipy 1.11.3 scvi 1.0.4 seaborn 0.11.2 setuptools 68.2.2 six 1.16.0 sklearn 1.3.2 skmisc 0.3.0 sniffio 1.3.0 socks 1.7.1 soupsieve 2.5 sparse 0.14.0 sparsecca 0.3.1 stack_data 0.6.3 starlette 0.27.0 statsmodels 0.14.0 switchlang 0.1.0 sympy 1.12 texttable 1.7.0 threadpoolctl 3.2.0 tomli 2.0.1 toolz 0.12.0 torch 2.1.0+cu121 torchgen NA torchmetrics 1.2.0 tqdm 4.66.1 traitlets 5.11.2 tree 0.1.8 typing_extensions NA urllib3 1.26.18 uvicorn 0.23.2 wcwidth 0.2.8 websocket 1.6.4 websockets 12.0 wrapt 1.15.0 xarray 2023.10.1 xarray_einstats 0.6.0 yaml 6.0.1 yarl 1.9.2 zipp NA zoneinfo NA

Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) [GCC 12.3.0] Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.28

Session information updated at 2023-10-24 21:56

Oct 24 '23 20:10 yugeji

I think that a Filelock might help here. If not, one needs to pre-download the datasets

Nov 03 '23 13:11 Zethson

pertpy pertpy copied to clipboard

Synchronous dataloader write fails

Report

Version information

pertpy 0.5.0 session_info 1.0.0

Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) [GCC 12.3.0] Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.28

pertpy
pertpy copied to clipboard