pavilion2 icon indicating copy to clipboard operation
pavilion2 copied to clipboard

Pav atomic renaming broken on some NFS

Open Paul-Ferrell opened this issue 2 years ago • 0 comments

Pavilion in several cases writes a file to a tmp name, then renames it atomically. Some NFS filesystems don't like that, and the rename gets a FileNotFoundError.

All of those need to be handled such that the system is given a bit of time (and at least one context switch) to deal with it.

Unknown error running command _run. sequence item 0: expected str instance,
FileNotFoundError found
Traceback (most recent call last):
  File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/variables.py", line 610, in save
    tmp_path.rename(path)
  File "/collab/usr/gapps/python/build/spack-toss4.1/var/spack/environments/python/._view/75prb56irmif5ejtirjthpx6kq3gqo52/lib/python3.9/pathlib.py", line 1382, in rename
    self._accessor.rename(self, target)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/workspace/sly1/git/pavilion2-super/working_dir/test_runs/471/variables.tmp' -> '/usr/workspace/sly1/git/pavilion2-super/working_dir/test_runs/471/variables'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/main.py", line 111, in run_cmd
    return cmd.run(pav_cfg, args)
  File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/commands/_run.py", line 64, in run
    test.finalize(var_man)
  File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/test_run/test_run.py", line 441, in finalize
    self.var_man.save(self._variables_path)
  File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/variables.py", line 612, in save
    raise VariableError(
  File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/errors.py", line 158, in __init__
    key = '.'.join(key)
TypeError: sequence item 0: expected str instance, FileNotFoundError found

This seems to work:

        try:
            with tmp_path.open('w') as config_file:

                yaml.dump(config, config_file)
        except (OSError, IOError) as err:
            raise TestRunError(
                "Could not save TestRun ({}) config at {}"
                .format(self.name, self.path), err)
        except TypeError as err:
            raise TestRunError(
                "Invalid type in config for ({})"
                .format(self.name), err)

        try:
            config_path.unlink()
        except (OSError, FileNotFoundError):
            pass

        start = time.time()
        while time.time() - start < 0.1:
            try:
                tmp_path.rename(config_path)
            except (FileNotFoundError):
                time.sleep(0.01)
                continue

Paul-Ferrell avatar Jul 20 '23 20:07 Paul-Ferrell