pavilion2
pavilion2 copied to clipboard
Pav atomic renaming broken on some NFS
Pavilion in several cases writes a file to a tmp name, then renames it atomically. Some NFS filesystems don't like that, and the rename gets a FileNotFoundError.
All of those need to be handled such that the system is given a bit of time (and at least one context switch) to deal with it.
Unknown error running command _run. sequence item 0: expected str instance,
FileNotFoundError found
Traceback (most recent call last):
File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/variables.py", line 610, in save
tmp_path.rename(path)
File "/collab/usr/gapps/python/build/spack-toss4.1/var/spack/environments/python/._view/75prb56irmif5ejtirjthpx6kq3gqo52/lib/python3.9/pathlib.py", line 1382, in rename
self._accessor.rename(self, target)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/workspace/sly1/git/pavilion2-super/working_dir/test_runs/471/variables.tmp' -> '/usr/workspace/sly1/git/pavilion2-super/working_dir/test_runs/471/variables'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/main.py", line 111, in run_cmd
return cmd.run(pav_cfg, args)
File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/commands/_run.py", line 64, in run
test.finalize(var_man)
File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/test_run/test_run.py", line 441, in finalize
self.var_man.save(self._variables_path)
File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/variables.py", line 612, in save
raise VariableError(
File "/usr/WS2/sly1/git/pavilion2-super/pavilion2/lib/pavilion/errors.py", line 158, in __init__
key = '.'.join(key)
TypeError: sequence item 0: expected str instance, FileNotFoundError found
This seems to work:
try:
with tmp_path.open('w') as config_file:
yaml.dump(config, config_file)
except (OSError, IOError) as err:
raise TestRunError(
"Could not save TestRun ({}) config at {}"
.format(self.name, self.path), err)
except TypeError as err:
raise TestRunError(
"Invalid type in config for ({})"
.format(self.name), err)
try:
config_path.unlink()
except (OSError, FileNotFoundError):
pass
start = time.time()
while time.time() - start < 0.1:
try:
tmp_path.rename(config_path)
except (FileNotFoundError):
time.sleep(0.01)
continue