rapids-single-cell-examples icon indicating copy to clipboard operation
rapids-single-cell-examples copied to clipboard

CUDARuntimeError in Notebooks

Open Intron7 opened this issue 3 years ago • 15 comments

Hello everyone,

I'm having trouble running the notebooks on our institutes server (64 Core Epyc and 2 Quadro RTX 6000), however when I'm at home running them on my personal computer (AMD 5950x and RTX 3090) the notebooks run perfectly. If I run the 1M Brain GPU notebook it crashes once it reaches the sparse_gpu_array = cp.sparse.csr_matrix(adata.X[:USE_FIRST_N_CELLS], dtype=cp.float32) line

---------------------------------------------------------------------------
CUDARuntimeError                          Traceback (most recent call last)
<timed exec> in <module>

~/conda/rapids-0.18-10/lib/python3.8/site-packages/cupyx/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
    351             x = arg1.asformat(self.format)
    352             data = cupy.array(x.data)
--> 353             indices = cupy.array(x.indices, dtype='i')
    354             indptr = cupy.array(x.indptr, dtype='i')
    355             copy = False

~/conda/rapids-0.18-10/lib/python3.8/site-packages/cupy/_creation/from_data.py in array(obj, dtype, copy, order, subok, ndmin)
     39 
     40     """
---> 41     return core.array(obj, dtype, copy, order, subok, ndmin)
     42 
     43 

cupy/core/core.pyx in cupy.core.core.array()

cupy/core/core.pyx in cupy.core.core.array()

cupy/core/core.pyx in cupy.core.core._send_object_to_gpu()

cupy/core/core.pyx in cupy.core.core._alloc_async_transfer_buffer()

cupy/core/core.pyx in cupy.core.core._alloc_async_transfer_buffer()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.alloc_pinned_memory()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory._malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory._malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemory.__init__()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.hostAlloc()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.check_status()

CUDARuntimeError: cudaErrorOperatingSystem: OS call failed or operation not supported on this OS

I can produce the same error when I run the hlca gpu notebook during adata.obsm["X_pca"] = PCA(n_components=n_components, output_type="numpy").fit_transform(adata.X), if I wait some time after the scaling and before the PCA step. If submit the whole notebook at once I don't get any issues on the server. So far I tested these notebooks with rapids-0.18 for CUDAtoolkit 10.1 and 11.0. What is the issue here and how can I fix it? I am also confused since both the Quadro RTX 6000 and the RTX3090 have 24GB of VRAM. Could this be an issue with the memory allocation with rmm? Thank you for your help.

Intron7 avatar Mar 30 '21 13:03 Intron7

@Intron7 Please provide us the OS version used in both case.

We noticed "OS call failed or operation not supported on this OS" in the error message. RMM is supported on Linux OS only.

rilango avatar Apr 01 '21 20:04 rilango

Dear Rilango,

thank you for the quick reply. On our institute's sever we use Debian 10. On my local machine I use Ubuntu 20.04 LTS.

Intron7 avatar Apr 01 '21 21:04 Intron7

@Intron7, are you using conda? How did you build your environment? Can you provide the output of conda list?

cjnolet avatar Apr 01 '21 22:04 cjnolet

Here is the conda listoutput:

# packages in environment at /home/sdicks-local/conda/rapids-0.18:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abseil-cpp                20200225.2           he1b5a44_2    conda-forge
aiohttp                   3.7.4            py38h497a2fe_0    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
anndata                   0.7.5                    pypi_0    pypi
anyio                     2.2.0            py38h578d9bd_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
arrow-cpp                 1.0.1           py38hcb5322d_14_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
async-timeout             3.0.1                   py_1000    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
aws-c-common              0.4.59               h36c2ea0_1    conda-forge
aws-c-event-stream        0.1.6                had2084c_6    conda-forge
aws-checksums             0.1.10               h4e93380_0    conda-forge
aws-sdk-cpp               1.8.63               h9b98462_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.3              pyhd8ed1ab_0    conda-forge
blazingsql                0.18.0                   pypi_0    pypi
bleach                    3.3.0              pyh44b312d_0    conda-forge
bokeh                     2.2.3            py38h578d9bd_0    conda-forge
boost                     1.72.0           py38h1e42940_1    conda-forge
boost-cpp                 1.72.0               h9d3c048_4    conda-forge
brotli                    1.0.9                h9c3ff4c_4    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h7f98852_1    conda-forge
ca-certificates           2021.1.19            h06a4308_1    defaults
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2020.12.5        py38h06a4308_0    defaults
cffi                      1.14.5           py38ha65f79e_0    conda-forge
cfitsio                   3.470                hb418390_7    conda-forge
chardet                   4.0.0            py38h578d9bd_1    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.1              pyhd8ed1ab_0    conda-forge
cloudpickle               1.6.0                      py_0    conda-forge
colorcet                  2.0.6              pyhd8ed1ab_0    conda-forge
cryptography              3.4.7            py38ha5dfef3_0    conda-forge
cudatoolkit               11.0.221             h6bb024c_0    nvidia
cudf                      0.18.1          cuda_11.0_py38_g999be56c80_0    rapidsai
cudf_kafka                0.18.1          py38_g999be56c80_0    rapidsai
cudnn                     8.0.0                cuda11.0_0    nvidia
cugraph                   0.18.0          py38_g65ec965f_0    rapidsai
cuml                      0.18.0          cuda11.0_py38_gb5f59e005_0    rapidsai
cupy                      8.0.0            py38hb7c6141_0    rapidsai
curl                      7.75.0               h979ede3_0    conda-forge
cusignal                  0.18.0          py38_g42899d2_0    rapidsai
cuspatial                 0.18.0          py38_gf4da460_0    rapidsai
custreamz                 0.18.1          py38_g999be56c80_0    rapidsai
cuxfilter                 0.18.0          py38_gac6f488_0    rapidsai
cycler                    0.10.0                   pypi_0    pypi
cyrus-sasl                2.1.27               h3274739_1    conda-forge
cytoolz                   0.11.0           py38h497a2fe_3    conda-forge
dask                      2021.3.1           pyhd8ed1ab_0    conda-forge
dask-core                 2021.3.1           pyhd8ed1ab_0    conda-forge
dask-cuda                 0.18.0                   py38_0    rapidsai
dask-cudf                 0.18.1          py38_g999be56c80_0    rapidsai
datashader                0.11.1             pyh9f0ad1d_0    conda-forge
datashape                 0.5.4                      py_1    conda-forge
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2021.3.1         py38h578d9bd_0    conda-forge
dlpack                    0.3                  he1b5a44_1    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
expat                     2.3.0                h9c3ff4c_0    conda-forge
faiss-proc                1.0.0                      cuda    conda-forge
fastavro                  1.3.4            py38h497a2fe_0    conda-forge
fastrlock                 0.6              py38h709712a_0    conda-forge
fiona                     1.8.18           py38h37fbd03_0    conda-forge
fontconfig                2.13.1            hba837de_1004    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
freexl                    1.0.6                h7f98852_0    conda-forge
fsspec                    0.8.7              pyhd8ed1ab_0    conda-forge
future                    0.18.2           py38h578d9bd_3    conda-forge
gdal                      3.1.4            py38h25844d8_2    conda-forge
geopandas                 0.8.1                      py_0    conda-forge
geos                      3.8.1                he1b5a44_0    conda-forge
geotiff                   1.6.0                h5d11630_3    conda-forge
get-version               2.1                      pypi_0    pypi
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glog                      0.4.0                h49b9bf7_3    conda-forge
google-cloud-cpp          1.16.0               he4a878c_2    conda-forge
google-cloud-cpp-common   0.25.0               he83eced_7    conda-forge
googleapis-cpp            0.10.0               h6b1abdc_4    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
greenlet                  1.0.0            py38h709712a_0    conda-forge
grpc-cpp                  1.32.0               h7997a97_1    conda-forge
h5py                      3.2.1                    pypi_0    pypi
harfbuzz                  2.8.0                h83ec7ef_1    conda-forge
hdf4                      4.2.13            h10796ff_1004    conda-forge
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
importlib-metadata        3.9.0            py38h578d9bd_0    conda-forge
ipykernel                 5.3.4            py38h5ca1d4c_0    defaults
ipython                   7.22.0           py38hd0cf306_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
jedi                      0.18.0           py38h578d9bd_2    conda-forge
jinja2                    2.11.3             pyh44b312d_0    conda-forge
joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
jpype1                    1.2.1            py38h1fd1430_0    conda-forge
json-c                    0.13.1            hbfbb72e_1002    conda-forge
jsonschema                3.2.0              pyhd8ed1ab_3    conda-forge
jupyter-server-proxy      3.0.2              pyhd8ed1ab_0    conda-forge
jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
jupyter_core              4.7.1            py38h578d9bd_0    conda-forge
jupyter_server            1.5.1            py38h578d9bd_0    conda-forge
jupyterlab-nvdashboard    0.4.0                    pypi_0    pypi
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
kealib                    1.4.14               hcc255d8_2    conda-forge
kiwisolver                1.3.1                    pypi_0    pypi
krb5                      1.17.2               h926e7f8_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
legacy-api-wrap           1.2                      pypi_0    pypi
leidenalg                 0.8.3                    pypi_0    pypi
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libcrc32c                 1.1.1                h9c3ff4c_2    conda-forge
libcudf                   0.18.1          cuda11.0_g999be56c80_0    rapidsai
libcudf_kafka             0.18.1            g999be56c80_0    rapidsai
libcugraph                0.18.0          cuda11.0_g65ec965f_0    rapidsai
libcuml                   0.18.0          cuda11.0_gb5f59e005_0    rapidsai
libcumlprims              0.18.0          cuda11.0_g5939d3e_0    nvidia
libcurl                   7.75.0               hc4aaa36_0    conda-forge
libcuspatial              0.18.0          cuda11.0_gf4da460_0    rapidsai
libdap4                   3.20.6               hd7c4107_2    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libfaiss                  1.6.3           h328c4c8_3_cuda    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgcrypt                 1.9.2                h7f98852_0    conda-forge
libgdal                   3.1.4                h02eeb80_2    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libglib                   2.68.0               h3e27bee_2    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libgpg-error              1.42                 h9c3ff4c_0    conda-forge
libgsasl                  1.8.0                         2    conda-forge
libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libkml                    1.3.0             hd79254b_1012    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libntlm                   1.4               h7f98852_1002    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     12.3                 h255efa7_3    conda-forge
libprotobuf               3.13.0.1             h8b12597_0    conda-forge
librdkafka                1.5.3                h54cafa9_0    conda-forge
librmm                    0.18.0          cuda11.0_ga4ee6b7_0    rapidsai
librttopo                 1.1.0                hb271727_4    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libspatialindex           1.9.3                h9c3ff4c_3    conda-forge
libspatialite             5.0.1                h6ec7341_0    conda-forge
libssh2                   1.9.0                ha56f1ee_6    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libthrift                 0.13.0               h5aa387f_6    conda-forge
libtiff                   4.2.0                hdc55705_0    conda-forge
libutf8proc               2.6.1                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.41.0               h7f98852_0    conda-forge
libwebp                   1.2.0                h3452ae3_0    conda-forge
libwebp-base              1.2.0                h7f98852_2    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxgboost                1.3.3dev.rapidsai0.18      cuda11.0_0    rapidsai
libxml2                   2.9.10               h72842e0_3    conda-forge
llvmlite                  0.36.0           py38h4630a5e_0    conda-forge
locket                    0.2.0                      py_2    conda-forge
louvain                   0.7.0                    pypi_0    pypi
lz4-c                     1.9.2                he1b5a44_3    conda-forge
markdown                  3.3.4              pyhd8ed1ab_0    conda-forge
markupsafe                1.1.1            py38h497a2fe_3    conda-forge
matplotlib                3.4.0                    pypi_0    pypi
mistune                   0.8.4           py38h497a2fe_1003    conda-forge
msgpack-python            1.0.2            py38h1fd1430_1    conda-forge
multicoretsne             0.1                      pypi_0    pypi
multidict                 5.1.0            py38h497a2fe_1    conda-forge
multipledispatch          0.6.0                      py_0    conda-forge
munch                     2.5.0                      py_0    conda-forge
natsort                   7.1.1                    pypi_0    pypi
nbclient                  0.5.3              pyhd8ed1ab_0    conda-forge
nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
nccl                      2.7.8.1            h4962215_100    nvidia
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
netifaces                 0.10.9          py38h497a2fe_1003    conda-forge
networkx                  2.5                        py_0    conda-forge
nodejs                    14.15.4              h92b4a50_1    conda-forge
notebook                  6.3.0            py38h578d9bd_0    conda-forge
numba                     0.53.1           py38h0e12cce_0    conda-forge
numexpr                   2.7.3                    pypi_0    pypi
numpy                     1.19.5           py38h18fd61f_1    conda-forge
nvtx                      0.2.3            py38h497a2fe_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjdk                   11.0.9.1             h5cc2fde_1    conda-forge
openjpeg                  2.4.0                hf7af979_0    conda-forge
openssl                   1.1.1k               h27cfd23_0    defaults
orc                       1.6.5                hd3605a7_0    conda-forge
packaging                 20.9               pyh44b312d_0    conda-forge
pandas                    1.1.5            py38h51da96c_0    conda-forge
pandoc                    2.12                 h7f98852_0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
panel                     0.10.3             pyhd8ed1ab_0    conda-forge
param                     1.10.1             pyhd3deb0d_0    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
parso                     0.8.1              pyhd8ed1ab_0    conda-forge
partd                     1.1.0                      py_0    conda-forge
patsy                     0.5.1                    pypi_0    pypi
pcre                      8.44                 he1b5a44_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.1.2            py38ha0e1e83_0    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
poppler                   0.89.0               h2de54a5_5    conda-forge
poppler-data              0.4.10                        0    conda-forge
postgresql                12.3                 hc2f5b80_3    conda-forge
proj                      7.1.1                h966b41f_3    conda-forge
prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
prompt-toolkit            3.0.18             pyha770c72_0    conda-forge
protobuf                  3.13.0.1         py38hadf7658_1    conda-forge
psutil                    5.8.0            py38h497a2fe_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py-xgboost                1.3.3dev.rapidsai0.18  cuda11.0py38_0    rapidsai
pyarrow                   1.0.1           py38h3e2403a_14_cuda    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyct                      0.4.6                      py_0    conda-forge
pyct-core                 0.4.6                      py_0    conda-forge
pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
pyee                      7.0.4              pyh9f0ad1d_0    conda-forge
pygments                  2.8.1              pyhd8ed1ab_0    conda-forge
pyhive                    0.6.3              pyhd3deb0d_0    conda-forge
pynndescent               0.5.2                    pypi_0    pypi
pynvml                    8.0.4                      py_1    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyppeteer                 0.2.2                      py_1    conda-forge
pyproj                    2.6.1.post1      py38h56787f0_3    conda-forge
pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
pysocks                   1.7.1            py38h578d9bd_3    conda-forge
python                    3.8.8           hffdb5ce_0_cpython    conda-forge
python-confluent-kafka    1.5.0            py38h1e0a361_0    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-igraph             0.9.1                    pypi_0    pypi
python_abi                3.8                      1_cp38    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyviz_comms               2.0.1              pyhd3deb0d_0    conda-forge
pyyaml                    5.4.1            py38h497a2fe_0    conda-forge
pyzmq                     22.0.3           py38h2035c66_1    conda-forge
rapids                    0.18.0          cuda11.0_py38_g334c31e_223    rapidsai
rapids-blazing            0.18.0          cuda11.0_py38_g334c31e_223    rapidsai
rapids-xgboost            0.18.0          cuda11.0_py38_g334c31e_223    rapidsai
re2                       2020.10.01           he1b5a44_0    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
rmm                       0.18.0          cuda_11.0_py38_ga4ee6b7_0    rapidsai
rtree                     0.9.7            py38h02d302b_1    conda-forge
sasl                      0.2.1           py38h950e882_1002    conda-forge
scanpy                    1.7.1                    pypi_0    pypi
scikit-learn              0.24.1           py38h658cfdd_0    conda-forge
scipy                     1.6.2            py38h7b17777_0    conda-forge
seaborn                   0.11.1                   pypi_0    pypi
send2trash                1.5.0                      py_0    conda-forge
setuptools                49.6.0           py38h578d9bd_3    conda-forge
shapely                   1.7.1            py38ha11d057_1    conda-forge
simpervisor               0.4                pyhd8ed1ab_0    conda-forge
sinfo                     0.3.1                    pypi_0    pypi
six                       1.15.0             pyh9f0ad1d_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sniffio                   1.2.0            py38h578d9bd_1    conda-forge
sortedcontainers          2.3.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.7.0                hc9558a2_2    conda-forge
sqlalchemy                1.4.3            py38h497a2fe_0    conda-forge
sqlite                    3.35.3               h74cdb3f_0    conda-forge
statsmodels               0.12.2                   pypi_0    pypi
stdlib-list               0.8.0                    pypi_0    pypi
streamz                   0.6.2              pyh44b312d_0    conda-forge
tables                    3.6.1                    pypi_0    pypi
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
terminado                 0.9.3            py38h578d9bd_0    conda-forge
testpath                  0.4.4                      py_0    conda-forge
texttable                 1.6.3                    pypi_0    pypi
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
thrift                    0.13.0           py38h709712a_2    conda-forge
thrift_sasl               0.3.0           py38h1e0a361_1002    conda-forge
tiledb                    2.1.6                h1022b9d_0    conda-forge
tk                        8.6.10               h21135ba_1    conda-forge
toolz                     0.11.1                     py_0    conda-forge
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.59.0             pyhd8ed1ab_0    conda-forge
traitlets                 5.0.5                      py_0    conda-forge
treelite                  1.0.0            py38hd08a91b_0    conda-forge
treelite-runtime          1.0.0                    pypi_0    pypi
typing-extensions         3.7.4.3                       0    conda-forge
typing_extensions         3.7.4.3                    py_0    conda-forge
tzcode                    2021a                h7f98852_1    conda-forge
ucx                       1.9.0+gcd9efd3       cuda11.0_0    rapidsai
ucx-proc                  1.0.0                       gpu    rapidsai
ucx-py                    0.18.0          py38_gcd9efd3_0    rapidsai
umap-learn                0.5.1                    pypi_0    pypi
urllib3                   1.26.4             pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
websockets                8.1              py38h497a2fe_3    conda-forge
wget                      3.2                      pypi_0    pypi
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
widgetsnbextension        3.5.1            py38h578d9bd_4    conda-forge
xarray                    0.17.0             pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.3                h9d8b166_2    conda-forge
xgboost                   1.3.3dev.rapidsai0.18  cuda11.0py38_0    rapidsai
xorg-fixesproto           5.0               h14c3975_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.6.12               h516909a_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
xorg-libxi                1.7.10               h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxtst              1.2.3             h516909a_1002    conda-forge
xorg-recordproto          1.14.2            h516909a_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yarl                      1.6.3            py38h497a2fe_1    conda-forge
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zict                      2.0.0                      py_0    conda-forge
zipp                      3.4.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.8                hdf46e1d_0    conda-forge

I installed the rapids-0.18 environment with conda create -p ~/conda/rapids-0.18 -c rapidsai -c nvidia -c conda-forge -c defaults rapids-blazing=0.18 python=3.8 cudatoolkit=11.0. Then I installed the other packages like Scanpy with pip.

Yesterday I also tried the cuda-11.0 docker with the same result.

Intron7 avatar Apr 01 '21 22:04 Intron7

I can reproduce this issue. Additional stacktrace info

ipp1-0129:2216 :0:2216] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10) ==== backtrace (tid: 2216) ==== 0 /root/conda/rapids-0.18/lib/python3.8/site-packages/ucp/_libs/../../../../libucs.so.0(ucs_handle_error+0x115) [0x7fdc36e10ee5] 1 /root/conda/rapids-0.18/lib/python3.8/site-packages/ucp/_libs/../../../../libucs.so.0(+0x26281) [0x7fdc36e11281] 2 /root/conda/rapids-0.18/lib/python3.8/site-packages/ucp/_libs/../../../../libucs.so.0(+0x26452) [0x7fdc36e11452] 3 /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fde1a6fd730] 4 /root/conda/rapids-0.18/lib/python3.8/site-packages/scipy/sparse/_sparsetools.cpython-38-x86_64-linux-gnu.so(+0x4e7e) [0x7fdd7951de7e] 5 /root/conda/rapids-0.18/lib/python3.8/site-packages/scipy/sparse/_sparsetools.cpython-38-x86_64-linux-gnu.so(+0x5bb0) [0x7fdd7951ebb0] 6 /root/conda/rapids-0.18/bin/python(PyCFunction_Call+0xf9) [0x56359fcf0e99] 7 /root/conda/rapids-0.18/bin/python(_PyObject_MakeTpCall+0x31e) [0x56359fcfff2e] 8 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x4f2e) [0x56359fd99b4e] 9 /root/conda/rapids-0.18/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x56359fd75503] 10 /root/conda/rapids-0.18/bin/python(+0x1b2007) [0x56359fd77007] 11 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x1782) [0x56359fd963a2] 12 /root/conda/rapids-0.18/bin/python(+0x1b1e86) [0x56359fd76e86] 13 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x4ca3) [0x56359fd998c3] 14 /root/conda/rapids-0.18/bin/python(_PyFunction_Vectorcall+0x1a6) [0x56359fd76706] 15 /root/conda/rapids-0.18/bin/python(+0x18287d) [0x56359fd4787d] 16 /root/conda/rapids-0.18/bin/python(PyObject_GetItem+0x45) [0x56359fd4cbd5] 17 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0xd3d) [0x56359fd9595d] 18 /root/conda/rapids-0.18/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x56359fd75503] 19 /root/conda/rapids-0.18/bin/python(PyEval_EvalCodeEx+0x39) [0x56359fd76559] 20 /root/conda/rapids-0.18/bin/python(PyEval_EvalCode+0x1b) [0x56359fe199ab] 21 /root/conda/rapids-0.18/bin/python(+0x2731de) [0x56359fe381de] 22 /root/conda/rapids-0.18/bin/python(+0x128d4b) [0x56359fcedd4b] 23 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x92f) [0x56359fd9554f] 24 /root/conda/rapids-0.18/bin/python(+0x182ea3) [0x56359fd47ea3] 25 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x1d37) [0x56359fd96957] 26 /root/conda/rapids-0.18/bin/python(+0x182ea3) [0x56359fd47ea3] 27 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x1d37) [0x56359fd96957] 28 /root/conda/rapids-0.18/bin/python(+0x182ea3) [0x56359fd47ea3] 29 /root/conda/rapids-0.18/bin/python(+0x1958c9) [0x56359fd5a8c9] 30 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0xa4b) [0x56359fd9566b] 31 /root/conda/rapids-0.18/bin/python(_PyFunction_Vectorcall+0x1a6) [0x56359fd76706] 32 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x92f) [0x56359fd9554f] 33 /root/conda/rapids-0.18/bin/python(_PyFunction_Vectorcall+0x1a6) [0x56359fd76706] 34 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0xa4b) [0x56359fd9566b] 35 /root/conda/rapids-0.18/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x56359fd75503] 36 /root/conda/rapids-0.18/bin/python(_PyFunction_Vectorcall+0x378) [0x56359fd768d8] 37 /root/conda/rapids-0.18/bin/python(+0x1b1f91) [0x56359fd76f91] 38 /root/conda/rapids-0.18/bin/python(PyObject_Call+0x5e) [0x56359fcea0be] 39 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x21c1) [0x56359fd96de1] 40 /root/conda/rapids-0.18/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x56359fd75503] 41 /root/conda/rapids-0.18/bin/python(+0x1b2007) [0x56359fd77007] 42 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x1782) [0x56359fd963a2] 43 /root/conda/rapids-0.18/bin/python(+0x1925da) [0x56359fd575da] 44 /root/conda/rapids-0.18/bin/python(+0x128d4b) [0x56359fcedd4b] 45 /root/conda/rapids-0.18/bin/python(+0x13b3ea) [0x56359fd003ea] 46 /root/conda/rapids-0.18/bin/python(+0x21da4f) [0x56359fde2a4f] 47 /root/conda/rapids-0.18/bin/python(+0x128fc2) [0x56359fcedfc2] 48 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x92f) [0x56359fd9554f] 49 /root/conda/rapids-0.18/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x56359fd75503] 50 /root/conda/rapids-0.18/bin/python(_PyFunction_Vectorcall+0x378) [0x56359fd768d8] 51 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0xa4b) [0x56359fd9566b] 52 /root/conda/rapids-0.18/bin/python(+0x1925da) [0x56359fd575da] 53 /root/conda/rapids-0.18/bin/python(+0x128d4b) [0x56359fcedd4b] 54 /root/conda/rapids-0.18/bin/python(+0x13b3ea) [0x56359fd003ea] 55 /root/conda/rapids-0.18/bin/python(+0x21da4f) [0x56359fde2a4f] 56 /root/conda/rapids-0.18/bin/python(+0x128fc2) [0x56359fcedfc2] 57 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x92f) [0x56359fd9554f] 58 /root/conda/rapids-0.18/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x56359fd75503] 59 /root/conda/rapids-0.18/bin/python(+0x1b2007) [0x56359fd77007] 60 /root/conda/rapids-0.18/bin/python(_PyEval_EvalFrameDefault+0x92f) [0x56359fd9554f] 61 /root/conda/rapids-0.18/bin/python(+0x1925da) [0x56359fd575da]

rilango avatar Apr 16 '21 23:04 rilango

@Intron7 Please try with USE_FIRST_N_CELLS = 700000

Meanwhile, we will be creating a bug with RAPIDS regarding this issue.

rilango avatar Apr 20 '21 00:04 rilango

@rilango I ran it with 70000 cells today and it ran perfectly.

Intron7 avatar Apr 20 '21 14:04 Intron7

Hi,

i am trying to run the gpu pipeline using my dataset that contains adata.X.shape (645559, 66696). I am running this on my local machine that has 80GB ram and Nvidia-RTX 3090 with 24GB vRAM.

The workflow runs but

%%time
tmp_norm = sparse_gpu_array.tocsc()
marker_genes_raw = {
    ("%s_raw" % marker): tmp_norm[:, genes[genes == marker].index[0]].todense().ravel()
    for marker in markers
}

del tmp_norm

gives a memory error "MemoryError: std::bad_alloc: CUDA error at: /home/monib/anaconda3/envs/rapidgenomics/include/rmm/mr/device/managed_memory_resource.hpp:73: cudaErrorIllegalAddress an illegal memory access was encountered"

any suggestions?

Best regards Monib

m0nib avatar Oct 18 '21 10:10 m0nib

You seem to have the same error that I had. @rilango pointed out that if you oversubscribe the VRAM more than 2x notebooks tend to crash. This example relies heavily on UVM. While it should work on any GPU built on the Pascal architecture or newer, you will want to make sure there is enough main memory available. Oversubscribing a GPU by more than a factor of 2x can cause thrashing in UVM, which can ultimately lead to the notebook freezing.

Intron7 avatar Oct 18 '21 11:10 Intron7

I see. How did/do you change this ? Is there a command, setting or function call option I have to set for this?

Thank you

m0nib avatar Oct 18 '21 18:10 m0nib

Recently a multi-GPU version of the notebook was added to the examples.

https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_gpu_analysis_multigpu.ipynb

This allows much larger datasets. Please check if this helps.

rilango avatar Oct 18 '21 19:10 rilango

@m0nib what kind of dataset do you have with almost 70000 features? I would suggest to restrict that featurespace and cleanup the vram as much as possible

Intron7 avatar Oct 18 '21 19:10 Intron7

@Intron7 It’s a human lung cell atlas dataset. I have a subset with the disease on interest. I collated 4 datasets as well as the hlca and want to do clustering, and differential analysis.

@rilango thank for the info I will have a look.

m0nib avatar Oct 18 '21 23:10 m0nib

Recently a multi-GPU version of the notebook was added to the examples.

https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/1M_brain_gpu_analysis_multigpu.ipynb

This allows much larger datasets. Please check if this helps.

Hi @rilango I tried by setting up a fresh conda environment using the latest rapidgenomics files and ran the 1M_brain notebook.

i have only a single gpu RTX-3090, so i made sure that was reflected in cluster block of the notebook.

After running through the notebokk few times I keep getting an error:

%%time from cuml.dask.decomposition import PCA pca_data = PCA(n_components=50).fit_transform(dask_sparse_arr) pca_data.compute_chunk_sizes()

distributed.worker - WARNING - Compute Failed Function: _func_fit args: (PCAMG(), < could not convert arg to str >, 1291337, 4000, [(0, 163266), (0, 163266), (0, 163266), (0, 163266), (0, 163266), (0, 163266), (0, 163266), (0, 148475)], 0, False) kwargs: {} Exception: CUDADriverError('CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered')

and using my own data set i can get past this point but i get an error in adata.obsm['X_tsne'] = TSNE().fit_transform(adata.X[:,:tsne_n_pcs])

Any suggestions on this issue?

Thanks Best regards Monib

m0nib avatar Oct 20 '21 11:10 m0nib

@m0nib Sorry for the delay in response. Can you please send us the error you are getting from adata.obsm['X_tsne'] = TSNE().fit_transform(adata.X[:,:tsne_n_pcs])?

rilango avatar Oct 29 '21 16:10 rilango