garage icon indicating copy to clipboard operation
garage copied to clipboard

Log Directory Issue in wrap_experiment

Open ahalev opened this issue 3 years ago • 4 comments

There appears to be a bug in wrap_experiment, where the function to create a file to store an archive of the launcher's git repo fails.

Traceback:

tar (child): data/local/experiment/PONGNoFrameskip-v4_2/launch_archive.tar.xz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: data/local/experiment/PONGNoFrameskip-v4_2/launch_archive.tar.xz: Cannot write: Broken pipe
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Traceback (most recent call last):
  File "/home/ahalev/.conda/envs/remote_env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ahalev/.conda/envs/remote_env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ahalev/.conda/envs/remote_env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ahalev/repos/garage/src/garage/examples/torch/dqn_atari.py", line 103, in main
    **hyperparams)
  File "/home/ahalev/repos/garage/src/garage/experiment/experiment.py", line 368, in __call__
    ctxt = self._make_context(self._get_options(*args), **kwargs)
  File "/home/ahalev/repos/garage/src/garage/experiment/experiment.py", line 324, in _make_context
    make_launcher_archive(git_root_path=git_root_path, log_dir=log_dir)
  File "/home/ahalev/repos/garage/src/garage/experiment/experiment.py", line 559, in make_launcher_archive
    check=True)
  File "/home/ahalev/.conda/envs/remote_env/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '('tar', '--null', '--files-from', '-', '--xz', '--create', '--file', 'data/local/experiment/PONGNoFrameskip-v4_2/launch_archive.tar.xz')' returned non-zero exit status 2.
python-BaseException

Steps to reproduce:

Run

python examples/torch/dqn_atari.py PONG

My garage package info:

Metadata-Version: 2.1 Name: garage Version: 2020.9.0rc2.dev0

ahalev avatar Sep 30 '21 01:09 ahalev

That's strange. It might be best if we change the default argument of archive_launch_repo to False. I imagine this could be caused by the data directory not being writable, or perhaps from tar not being able to use xz. The tar command still looks correct, although it's conceivably also possible that git produced an unusual file list in some way. There are several test to check that this feature works, so it should work in this case.

krzentner avatar Sep 30 '21 02:09 krzentner

It's writable as far as I can tell -- if I run

if [ -w ` pwd ` ]; then echo "WRITABLE"; else echo "NOT WRITABLE"; fi

in $git_root_path$/data/local/experiment it spits out WRITABLE.

Not sure how to check whether tar can use xz.

ahalev avatar Sep 30 '21 02:09 ahalev

Oh, I see what happened. wrap_experiment expects log_dir to be an absolute path (and sets it to an absolute path by default), but this example explicitly sets it to a relative path. The tar command is always run in the git repo root, so if the example is run from a git repo but not from the root, then the log directory doesn't exists and the tar command fails.

Probably ExperimentWrapper should always make the log_dir into an absolute path if this condition fails.

krzentner avatar Sep 30 '21 02:09 krzentner

Yes, it works with archive_launch_repo=False. Bizarrely, I copied the entire content of dqn_atari.py to a different folder outside of the repo and it works there with archive_launch_repo=True.

ahalev avatar Oct 01 '21 01:10 ahalev