pertpy
pertpy copied to clipboard
Synchronous dataloader write fails
Report
python==3.9.0 install pertpy install snakemake
Within the Snakefile:
rule prepare_data:
output: TMPDIR / 'prepared_{dataset}.h5ad'
resources:
time='8:00:00',
mem_mb=64000,
disk_mb=64000
run:
# IMPORTS HERE
import os
os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"
import pertpy as pt
dataset = wildcards.dataset
if dataset in ['sciplex_K562', 'sciplex_A549', 'sciplex_MCF7']:
cell_line = dataset.split('_')[1]
adata = pt.data.srivatsan_2020_sciplex3()
Because all three dataset values were run at the same time, pt.data.srivatsan_2020_sciplex3()
was run in three different threads. Since the file was not pre-downloaded, all threads began downloading, causing a lock to be called on the file, preventing any thread from completing the download. Including the os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"
line does not fix this.
Version information
pertpy 0.5.0 session_info 1.0.0
IPython 8.16.1 PIL 10.1.0 absl NA adjustText 0.8 aiohttp 3.8.6 aiosignal 1.3.1 anndata 0.10.2 annotated_types 0.6.0 anyio NA arviz 0.16.1 asttokens NA async_timeout 4.0.3 attr 23.1.0 backcall 0.2.0 backoff 2.2.1 brotli 1.1.0 bs4 4.12.2 certifi 2023.07.22 cffi 1.16.0 charset_normalizer 3.3.1 chex 0.1.7 click 8.1.7 colorama 0.4.6 comm 0.1.4 contextlib2 NA croniter NA cycler 0.12.1 cython_runtime NA dateutil 2.8.2 decorator 5.1.1 decoupler 1.5.0 deepdiff 6.6.1 defusedxml 0.7.1 docrep 0.3.2 etils 1.5.1 exceptiongroup 1.1.3 executing 2.0.0 fastapi 0.104.0 flax 0.7.4 frozenlist 1.4.0 fsspec 2023.10.0 google NA h5py 3.10.0 idna 3.4 igraph 0.10.8 importlib_metadata NA importlib_resources NA ipywidgets 8.1.1 jax 0.4.19 jaxlib 0.4.19 jaxopt NA jedi 0.19.1 joblib 1.3.2 kiwisolver 1.4.5 leidenalg 0.10.1 lightning 2.0.9.post0 lightning_cloud 0.5.43 lightning_fabric 2.1.0 lightning_utilities 0.9.0 llvmlite 0.40.1 matplotlib 3.8.0 mizani 0.9.3 ml_collections NA ml_dtypes 0.3.1 mpl_toolkits NA mpmath 1.3.0 msgpack 1.0.7 mudata 0.2.3 multidict 6.0.4 multipart 0.0.6 multipledispatch 0.6.0 natsort 8.4.0 numba 0.57.1 numpy 1.24.0 numpyro 0.13.2 opt_einsum v3.3.0 optax 0.1.7 ordered_set 4.1.0 ott 0.4.4 packaging 23.2 pandas 2.1.1 parso 0.8.3 patsy 0.5.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA plotnine 0.12.3 ply 3.11 prompt_toolkit 3.0.39 psutil 5.9.5 ptyprocess 0.7.0 pure_eval 0.2.2 pycparser 2.21 pydantic 2.1.1 pydantic_core 2.4.0 pygments 2.16.1 pyomo 6.6.2 pyparsing 3.1.1 pyro 1.8.6 pytorch_lightning 2.1.0 pytz 2023.3.post1 requests 2.31.0 rich NA scanpy 1.9.5 scipy 1.11.3 scvi 1.0.4 seaborn 0.11.2 setuptools 68.2.2 six 1.16.0 sklearn 1.3.2 skmisc 0.3.0 sniffio 1.3.0 socks 1.7.1 soupsieve 2.5 sparse 0.14.0 sparsecca 0.3.1 stack_data 0.6.3 starlette 0.27.0 statsmodels 0.14.0 switchlang 0.1.0 sympy 1.12 texttable 1.7.0 threadpoolctl 3.2.0 tomli 2.0.1 toolz 0.12.0 torch 2.1.0+cu121 torchgen NA torchmetrics 1.2.0 tqdm 4.66.1 traitlets 5.11.2 tree 0.1.8 typing_extensions NA urllib3 1.26.18 uvicorn 0.23.2 wcwidth 0.2.8 websocket 1.6.4 websockets 12.0 wrapt 1.15.0 xarray 2023.10.1 xarray_einstats 0.6.0 yaml 6.0.1 yarl 1.9.2 zipp NA zoneinfo NA
Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32) [GCC 12.3.0] Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.28
Session information updated at 2023-10-24 21:56
I think that a Filelock might help here. If not, one needs to pre-download the datasets