aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

🐛 `FileNotFoundError` related to monitors

Open mbercx opened this issue 1 month ago • 3 comments

While "running" a workflow, I am reliably running into the following error:

FileNotFoundError: [Errno 2] No such file

From the full traceback (see below), the issue seems to be related to the monitors. In the end, the workflow does complete successfully, but the issue produces a lot of noise that will worry users.

I tested both core.ssh and core.ssh_async, and am running into the issue for both.

Full Traceback
11/05/2025 11:52:01 AM <1716> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/engine/transports.py", line 106, in request_transport
    yield transport_request.future
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/engine/processes/calcjobs/tasks.py", line 257, in do_monitor
    return monitors.process(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/engine/processes/calcjobs/monitors.py", line 204, in process
    monitor_result = monitor_function(node, transport, **monitor.kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-vasp/src/aiida_vasp/calcs/monitors.py", line 66, in monitor_stdout
    file_stat = transport.get_attribute(stdout_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/transports/plugins/ssh.py", line 1201, in get_attribute
    paramiko_attr = self.lstat(path)
                    ^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/transports/plugins/ssh.py", line 662, in lstat
    return self.sftp.lstat(path)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 511, in lstat
    t, msg = self._request(CMD_LSTAT, path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 857, in _request
    return self._read_response(num)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 909, in _read_response
    self._convert_status(msg)
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 938, in _convert_status
    raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file

11/05/2025 11:52:01 AM <1716> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 1 of do_monitor excepted, retrying after 20 seconds
Traceback (most recent call last):
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/engine/utils.py", line 205, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/engine/processes/calcjobs/tasks.py", line 257, in do_monitor
    return monitors.process(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/engine/processes/calcjobs/monitors.py", line 204, in process
    monitor_result = monitor_function(node, transport, **monitor.kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-vasp/src/aiida_vasp/calcs/monitors.py", line 66, in monitor_stdout
    file_stat = transport.get_attribute(stdout_path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/transports/plugins/ssh.py", line 1201, in get_attribute
    paramiko_attr = self.lstat(path)
                    ^^^^^^^^^^^^^^^^
  File "/Users/mbercx/project/defect/git/aiida-core/src/aiida/transports/plugins/ssh.py", line 662, in lstat
    return self.sftp.lstat(path)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 511, in lstat
    t, msg = self._request(CMD_LSTAT, path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 857, in _request
    return self._read_response(num)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 909, in _read_response
    self._convert_status(msg)
  File "/Users/mbercx/.aiida_venvs/defect/lib/python3.12/site-packages/paramiko/sftp_client.py", line 938, in _convert_status
    raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file

mbercx avatar Nov 05 '25 02:11 mbercx

Similarly, I also get a bunch of these for both SSH transport plugins:

11/05/2025 11:52:27 AM <1716> aiida.engine.processes.calcjobs.tasks: [WARNING] CalcJob<939> already marked as `CalcJobState.STASHING`, skipping task_monitor_job

mbercx avatar Nov 05 '25 02:11 mbercx

11/05/2025 11:52:27 AM <1716> aiida.engine.processes.calcjobs.tasks: [WARNING] CalcJob<939> already marked as CalcJobState.STASHING, skipping task_monitor_job

These shows due to persistency, for example it may show up when you restarted the daemon, shutdown, etc. Do you see them in a normal clean run?

khsrali avatar Nov 05 '25 10:11 khsrali

Do you see them in a normal clean run?

Yes, all of these I only noticed because I'm running the workflow with engine.run. This is on my own laptop, but let me see if I can reproduce/set something up on thanos so you can check it directly.

mbercx avatar Nov 05 '25 19:11 mbercx