Adam Moody issues

Results 154 issues of


                                            Adam Moody

scripts: improve job launcher interface

The ``JobLauncher`` interface for ``launch_run()`` is ambiguous in that some launchers require the list of nodes to run on (like ``aprun`` and ``mpirun``) while others take the list of nodes...

python

python: improve integration with user job batch scripts

Our ``scr_run.py`` script currently launches the user job with the launcher process via ``subprocess.Popen``. There are a few challenges with this: 1) Currently, we buffer all ``stdout`` and ``stderr`` and...

python

simulate different work kernels in test_api

To study and improve async flush performance in SCR, this extends ``test_api.c`` to execute various work kernels. By focusing on certain operations, e.g., CPU intensive, memory intensive, network intensive, etc....

Request to direct SCR log messages to stderr or a custom file descriptor

When running with ``SCR_DEBUG=1``, SCR prints log messages to ``stdout``. It would be useful at times to direct those messages to other files like ``stderr`` or perhaps a user-provided file...

SW4: error when deleting a directory from the parallel file system

One can delete files from the parallel file system by either calling ``SCR_Delete()`` or by setting ``SCR_PREFIX_SIZE=N``, in which case, SCR maintains a sliding window of the ``N`` most recent...

SW4: fetch is failing to find files and/or add them to AXL for transfer

After writing a checkpoint to the parallel file system, a later job attempts to restart. SCR detects that the checkpoint exists, but it fails when trying to fetch the files....

Restart with a different number of ranks

SCR currently allows an application to restart with a different number of ranks. However, one cannot call the SCR restart API in that case. https://scr.readthedocs.io/en/latest/users/integration.html#restart-without-scr This is awkward for applications...

Adam Moody