Singularity COMPSs COntainer
Level
CRITICAL (or MINIMAL if I am doing something wrong, not sure)
Component
DOCKER Image
Environment
- COMPSs version: 3.3
- Bindings: Python
- OS: Red Hat Enterprise Linux 9.2
Description
I am preparing a publication comparing different tools functioning as workflow managers to manage multiple workflow tasks.
Besides the performance of the tool itself, I want also to evaluate the installation of each one of the tools, since we need to provide support for many different systems.
I saw that COMPSs offers a docker container, which can be easily translated to a singularity container.
Problem is: it cannot start the sshd service that is mandatory to have the workers connected to the master. This is due to the protected hostkey that are automatically mounted by singularity.
My case is also different from the one in the documentation. I do not wish to execute a containerized application. Rather, I would like to save time installing COMPSs in the host machine by using the container.
Am I running it correctly?
Minimal example to reproduce
You can pass any valid python script. Mine is called simple_workflow.py.
singularity pull docker://compss/compss
singularity run compss_latest.sif launch_compss --master_node=glogin4 --worker_in_master_cpus=1 --lang=python --pythonpath=$(pwd) $(pwd)/simple_workflow.py
Exception
base) [bsc032371@glogin4 ~]$ singularity run compss_latest.sif launch_compss --master_node=glogin4 --worker_in_master_cpus=1 --lang=python --pythonpath=$(pwd) /home/bsc/bsc032371/compss-test/simple_workflow.py
/opt/COMPSs/Runtime/scripts/user/launch_compss: line 1638: [: disabled: integer expression expected
/opt/COMPSs/Runtime/scripts/user/launch_compss: line 1648: netstat: command not found
/opt/COMPSs/Runtime/scripts/user/launch_compss: line 1648: lsof: command not found
/opt/COMPSs/Runtime/scripts/user/launch_compss: line 543: blaunch: command not found
------ Launching COMPSs application ------
/opt/COMPSs/Runtime/scripts/user/launch_compss: line 2160: blaunch: command not found
[ INFO ] Using default execution type: compss
----------------- Executing simple_workflow.py --------------------------
WARNING: Import ERROR importing Numpy
WARNING: COMPSs Properties file is null. Setting default values
[(602) API] - Starting COMPSs Runtime v3.3.rc2404 (build 20240418-1855.r8290b6f9bb2604145d5cb1e3c36706387499ca01)
[ERRMGR] - WARNING: There was an exception when initiating worker glogin4.
Stack trace:
es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: An Error has occurred when queue system started NIO worker in resource glogin4. Retries not available in this option.
at es.bsc.compss.nio.master.WorkerStarter.killPreviousWorker(WorkerStarter.java:259)
at es.bsc.compss.nio.master.WorkerStarter.startWorker(WorkerStarter.java:134)
at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:141)
at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:165)
at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$2.run(StartWorkerAction.java:133)
[ERRMGR] - WARNING: Exception creating worker. Check runtime.log for more details
Stack trace:
es.bsc.compss.exceptions.InitNodeException: [START_CMD_ERROR]: An Error has occurred when queue system started NIO worker in resource glogin4. Retries not available in this option.
at es.bsc.compss.nio.master.WorkerStarter.killPreviousWorker(WorkerStarter.java:259)
at es.bsc.compss.nio.master.WorkerStarter.startWorker(WorkerStarter.java:134)
at es.bsc.compss.nio.master.NIOWorkerNode.start(NIOWorkerNode.java:141)
at es.bsc.compss.types.resources.ResourceImpl.start(ResourceImpl.java:165)
at es.bsc.compss.scheduler.types.allocatableactions.StartWorkerAction$2.run(StartWorkerAction.java:133)
[ERRMGR] - WARNING: No task could be scheduled to any of the available resources.
This could end up blocking COMPSs. Will check it again in 20 seconds.
Possible causes:
-Network problems: non-reachable nodes, sshd service not started, etc.
-There isn't any computing resource that fits the defined tasks constraints.
If this happens 2 more times, the runtime will shutdown.
[ERRMGR] - WARNING: No task could be scheduled to any of the available resources.
This could end up blocking COMPSs. Will check it again in 20 seconds.
Possible causes:
-Network problems: non-reachable nodes, sshd service not started, etc.
-There isn't any computing resource that fits the defined tasks constraints.
If this happens 1 more time, the runtime will shutdown.
[ERRMGR] - ERROR: Unschedulable tasks detected.
COMPSs has found tasks with constraints that cannot be fulfilled.
Shutting down COMPSs now...
[ERRMGR] - Shutting down COMPSs...
[(92880) API] - Execution Finished
Shutting down the running process
Error running application
Master execution failed. Exiting job.
Expected behaviour
Correct execution of the python script.