axolotl
axolotl copied to clipboard
RunPod template not working with network volumes, /workspace/axolotl empty
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Other users also encountered this: https://github.com/OpenAccess-AI-Collective/axolotl/issues/467
According to the RunPod discord, one needs to explicitly rsync in the docker image, like here:
https://github.com/runpod/containers/blob/main/official-templates/stable-diffusion-webui/pre_start.sh
Current behaviour
The /workspace/axolotl directory is sometimes empty
Steps to reproduce
Use the official axolotl main-latets image on runpod.
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
- [ ] Linux
- [ ] macOS
- [ ] Windows
Python Version
N/A
axolotl branch-commit
N/A
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
You can avoid this problem if you change the runpod template's persistent volume mount point from /workspace to /workspace/axolotl/data. It appears the runpod template doesn't live in the repository or I'd be happy to open a PR for the change. I chatted some with @winglian in discord and they have access to change the template. @winglian also suggested we also change the output_dir in the example yml files. I haven't tried axolotl with any other cloud GPU providers. Does changing the output_dir make sense on other cloud GPU providers? If so, I can open a PR for that change.
Just one caveat to this, an older issue wanted the HF models+datasets to be downloaded to the volume. If you change the above, the user should override these values:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/89134f2143cd3325802813eb97cd05c783932201/docker/Dockerfile-cloud#L4-L7