clearml
clearml copied to clipboard
[OSError] no space left on device
After running a multi-step pipeline successfully, I rerun it with zero code change, however one (training) step (which invloves multiprocessing) throws the [OSError] no space left on device
error message. I solved the problem by deleting files generated by ClearML in the \tmp
folder. If this an expected behavior of ClearML Pipeline, is there a way to avoid this overhead?
Hi @Waerden001 ,
What files did you delete exactly?
@jkhenning I am also facing similar issue. I suspect the setting up of the container, and the running of applications may have written to the pod's /tmp. From what I read, /tmp is default on tmpfs, thus limit to memory resource of the node.
Would mounting /tmp to emptydir volume helps? Currently, I have not found a way to add to the clearml-agent running on k8s. Any advice?
Hi @okyspace , yes, I assume mounting /tmp
would help. I suspect his is the agent's log being written there