axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

RunPod template not working with network volumes, /workspace/axolotl empty

Open Palmik opened this issue 2 years ago • 2 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Other users also encountered this: https://github.com/OpenAccess-AI-Collective/axolotl/issues/467

According to the RunPod discord, one needs to explicitly rsync in the docker image, like here: https://github.com/runpod/containers/blob/main/official-templates/stable-diffusion-webui/pre_start.sh image

Current behaviour

The /workspace/axolotl directory is sometimes empty

Steps to reproduce

Use the official axolotl main-latets image on runpod.

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

  • [ ] Linux
  • [ ] macOS
  • [ ] Windows

Python Version

N/A

axolotl branch-commit

N/A

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Palmik avatar Nov 02 '23 18:11 Palmik

You can avoid this problem if you change the runpod template's persistent volume mount point from /workspace to /workspace/axolotl/data. It appears the runpod template doesn't live in the repository or I'd be happy to open a PR for the change. I chatted some with @winglian in discord and they have access to change the template. @winglian also suggested we also change the output_dir in the example yml files. I haven't tried axolotl with any other cloud GPU providers. Does changing the output_dir make sense on other cloud GPU providers? If so, I can open a PR for that change.

mattflo avatar Feb 15 '24 15:02 mattflo

Just one caveat to this, an older issue wanted the HF models+datasets to be downloaded to the volume. If you change the above, the user should override these values:

https://github.com/OpenAccess-AI-Collective/axolotl/blob/89134f2143cd3325802813eb97cd05c783932201/docker/Dockerfile-cloud#L4-L7

NanoCode012 avatar Mar 30 '24 17:03 NanoCode012