Move away from using `/tmp` directories
Using /tmp to store temporary outputs such as model checkpoints, tokenizers, logs, etc is not good practice because /tmp is often deleted and shared across users in a remote environment with stricter permissions. We should update all example configs and download scripts to move away from using /tmp and instead use something similar to ~/.cache/torch/hub or $TORCH_HOME: https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved
cc @NicolasHug @kartikayk
@joecummings we probably need to address this as part of tune download to make sure we arent' downloading to tmp. I can look at this on the checkpointing side as part of recipe UX.
Tracking this in #691
@kartikayk this isn't related to testing and this isn't a "quality of life" thing, this is user-facing. It should probably not be part of https://github.com/pytorch/torchtune/issues/691.
Ok yeh, I think I read this as addressing the /tmp/artifacts issue but that shouldn't be a problem anymore. For downloading the models and files, I think we can find a better placeholder for this, but I haven't seen any issues or complaints around this.
Why not create a dedicated cache folder and set it as an environment variable on install?
@RdoubleA how would we set an env variable on install in a persistent way?
@kartikayk should this issue be re-opened?
I'm not sure torchtune needs to do much more than what is already done by https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved
Why is what hub does a good idea? Models etc that we're downloading are all from HF, why do we need to do what torch hub does for this? IIUC, HF already does a bunch of caching in the background.
I don't understand.
Using /tmp to store temporary outputs such as model checkpoints, tokenizers, logs, etc is not good practice because /tmp is often deleted and shared across users in a remote environment with stricter permissions. We should update all example configs and download scripts to move away from using /tmp and instead use something similar to ~/.cache/torch/hub or $TORCH_HOME: pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved
The original comment in this issue makes sense to me, i.e. if torchtune is downloading or storing stuff, it should probably not be in /tmp and rather be in a dedicated folder with a similar configuration logic to that of torchhub. Is this not relevant anymore?
if torchtune is downloading or storing stuff, it should probably not be in /tmp and rather be in a dedicated folder with a similar configuration logic to that of torchhub
Sorry I dont understand this comment. Can you share more? Like I mentioned we use the HF download API so I'm not sure why I should setup using torch hub. But maybe I'm missing something on this.
On runpod, all of the volume storage is on /workspace. And thats where I download to. I dont see the value for setting up some of the stuff mentioned above, but again, maybe I'm missing something.
The premise of this issue is that torchtune is storing stuff somewhere, and that:
- "somewhere" is
/tmp - "stuff" denote models, temporary outputs, logs, checkpoints, etc.
Is this not the case?
Closing this as stale and no issues have been raised on this