compss icon indicating copy to clipboard operation
compss copied to clipboard

Importing Numpy Error with Hello World Script on MN5

Open manuel-g-castro opened this issue 1 year ago • 7 comments

Component

RUNTIME / PYTHON BINDING

Both because I am unsure.

Environment

  • COMPSs version: 3.3
  • Python : 3.8.18
  • Operating System: Red Hat Enterprise Linux 9.2 (MareNostrum 5's OS)

Description

Upon testing the simple script provided to me by Jorge in MareNostrum 5, it fails due to some error upon importing numpy.

Minimal example to reproduce

Execute the following script

module load python/3.8.18
module load COMPSs/3.3

launch_compss \
    --sc_cfg=mn.cfg \
    --master_node="$SLURMD_NODENAME" \
    --worker_nodes="" \
    --worker_in_master_cpus=48 \
    --lang="python" \
    --pythonpath=$(pwd) \
    test.py

where test.py is

#!/usr/bin/python3

# -*- coding: utf-8 -*-
from pycompss.api.api import compss_wait_on
from pycompss.api.task import task
from pycompss.api.parameter import *


@task()
def hello(name):
    return "Hello " + name

if __name__ == '__main__':
   res = hello("world")
   res = compss_wait_on(res)
   print(res)

Exception

WARNING: Import ERROR importing Numpy
Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

manuel-g-castro avatar May 17 '24 09:05 manuel-g-castro

I had a different result, @manuel-g-castro, using your test.py and the command you used on glogin1.

$ launch_compss     --sc_cfg=mn.cfg     --master_node="$SLURMD_NODENAME"     --worker_nodes=""     --worker_in_master_cpus=48     --lang="python"     --pythonpath=$(pwd)     test.py
Missing master node parameter


Port 43271 is already in use or time_wait, incrementing port by 1
Port 43272 is already in use or time_wait, incrementing port by 1
Port 43273 is already in use or time_wait, incrementing port by 1
srun: error: No account specified, please specify an account
srun: error: Unable to allocate resources: Unspecified error
------ Launching COMPSs application ------
No master to run...

I logged in and tried your commands to load the modules too.

$ module list

Currently Loaded Modules:
  1) intel/2023.2.0   4) ucx/1.15.0        7) python/3.8.18
  2) impi/2021.10.0   5) oneapi/2023.2.0   8) papi/7.1.0-gcc
  3) mkl/2023.2.0     6) bsc/1.0           9) COMPSs/3.3

kinow avatar May 17 '24 11:05 kinow

@kinow and @manuel-g-castro I suppose you are submitting the script that includes the launch_compss with sbatch. Due to the changes of MareNostrum5, in job submissions you have to add the account with the flag -A or --account.

jorgee avatar May 17 '24 12:05 jorgee

@manuel-g-castro could you try adding module load hdf5?. In principle it shouldn't be required but I think it was loaded during the compilation because it is needed for python/3.12.1 in MN5 and somethings has been linked to this library

jorgee avatar May 17 '24 12:05 jorgee

Hey, @jorgee, thank you for the fast answer. I am checking this issue after the long weekend.

There is a fundamental piece of information that I forgot to mention, my bad, is that I run all of this within and interactive session in Slurm.

So the first step to reproduce the error is to execute salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces.

I am now trying to execute importing hdf5 and it is still failing. My understanding is that I do not need to specify any account, since we are already allocated resources. Am I right?

The command that I am executing:

[bsc032371@gs08r2b69 compss-test]$ launch_compss     --sc_cfg=mn.cfg     --master_node="$SLURMD_NODENAME"     --worker_nodes=""     --worker_in_master_cpus=48     --lang="python"     --pythonpath=$(pwd) $(pwd)/test.py 
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss

----------------- Executing test.py --------------------------

Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
    ^^^^^^^^^^^^^
ImportError: libcilkrts.so.5: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

These are my loaded modules:

[bsc032371@gs08r2b69 compss-test]$ module list

Currently Loaded Modules:
  1) mkl/2023.2.0   3) oneapi/2023.2.0   5) intel/2024.1   7) hdf5/1.14.4.2    9) python/3.12.1   11) COMPSs/3.3
  2) ucx/1.15.0     4) bsc/1.0           6) impi/2021.12   8) sqlite3/3.45.2  10) papi/7.1.0-gcc

manuel-g-castro avatar May 21 '24 09:05 manuel-g-castro

Logged in to glogin4, then tried the same you did, @manuel-g-castro , but loading impi/intel/and hdf5:

$ salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces
$ module load python/3.8.18
$ module load COMPSs/3.3
$ module load impi/2021.12 intel/2024.1 hdf5/1.14.4.2
$ file test.py
test.py: Python script, ASCII text executable
$ launch_compss \
    --sc_cfg=mn.cfg \
    --master_node="$SLURMD_NODENAME" \
    --worker_nodes="" \
    --worker_in_master_cpus=48 \
    --lang="python" \
    --pythonpath=$(pwd) \
    test.py
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss

----------------- Executing test.py --------------------------

WARNING: Import ERROR importing Numpy
Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

There might be some other combination of modules that work, but I guess @jorgee might know better which ones would have to be loaded for your script to work.

kinow avatar May 30 '24 15:05 kinow

It is also happening without COMPSs. Please tell [email protected]

[bsc019611@glogin2 ~]$ module load python/3.8.18 load PYTHON/3.8.18 (PATH, MANPATH, LD_LIBRARY_PATH, LIBRARY_PATH, PKG_CONFIG_PATH, C_INCLUDE_PATH, CPLUS_INCLUDE_PATH, PYTHONHOME, PYTHONPATH) [bsc019611@glogin2 ~]$ python3 Python 3.8.18 (default, Feb 7 2024, 09:13:21) [Clang 17.0.0 (icx 2024.0.0.20231017)] on linux Type "help", "copyright", "credits" or "license" for more information.

import numpy Traceback (most recent call last): File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/init.py", line 23, in from . import multiarray File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/multiarray.py", line 10, in from . import overrides File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/overrides.py", line 6, in from numpy.core._multiarray_umath import ( ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory

jorgee avatar May 31 '24 13:05 jorgee

Hey, @jorgee, thank you for the fast answer. I am checking this issue after the long weekend.

There is a fundamental piece of information that I forgot to mention, my bad, is that I run all of this within and interactive session in Slurm.

So the first step to reproduce the error is to execute salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces.

I am now trying to execute importing hdf5 and it is still failing. My understanding is that I do not need to specify any account, since we are already allocated resources. Am I right?

The command that I am executing:

[bsc032371@gs08r2b69 compss-test]$ launch_compss     --sc_cfg=mn.cfg     --master_node="$SLURMD_NODENAME"     --worker_nodes=""     --worker_in_master_cpus=48     --lang="python"     --pythonpath=$(pwd) $(pwd)/test.py 
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss

----------------- Executing test.py --------------------------

Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
    ^^^^^^^^^^^^^
ImportError: libcilkrts.so.5: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

These are my loaded modules:

[bsc032371@gs08r2b69 compss-test]$ module list

Currently Loaded Modules:
  1) mkl/2023.2.0   3) oneapi/2023.2.0   5) intel/2024.1   7) hdf5/1.14.4.2    9) python/3.12.1   11) COMPSs/3.3
  2) ucx/1.15.0     4) bsc/1.0           6) impi/2021.12   8) sqlite3/3.45.2  10) papi/7.1.0-gcc

I think they have changed the default intel modules. Are you using COMPSS_PYTHON_VERSION? It will be great if you send this MareNosrtum related errors to [email protected], as they are more related to the installation in the supercomputer than errors in the code.

jorgee avatar May 31 '24 14:05 jorgee

Hello, @jorgee . Sorry for the late response. I was stuck on Ph.D bureaucracy for a while.

I read on your documentation that it should not be needed to set that flag if you load the python library before COMPSs.

Anyhow, I tried again and the issue persisted. Therefore, I contacted support and put you and Daniele on CC (I hope not to bother too much).

Thank you.

manuel-g-castro avatar Aug 30 '24 10:08 manuel-g-castro

Closing this issue because Javier Conejero has answered. I needed to import the hdf library.

manuel-g-castro avatar Aug 30 '24 12:08 manuel-g-castro