training
training copied to clipboard
run stable diffusion see no space left on device error
please see the error below, I have 28T disk, how to work around this problem?
=> => sha256:f0d07955f406a98d2a3ef5cd3cb1c559339dc6cbbbe209914ad33ad7141df159 54.92MB / 54.92MB 58.7s
=> => sha256:63edc579ff7da32c44f2ecbb8cebdb8c0bc75afea5c55bfc40ca0faa1f25b971 15.71kB / 15.71kB 58.9s
=> => sha256:8abf5ec0bb1316cac46570b2bc2d49b8628ce20f17ca13f7e77b260e5fc3da6c 512B / 512B 59.0s
=> => extracting sha256:a0e57127409620ffbd134fed398941297a33e6ac6666f11a8112b9912fa9c134 65.1s
=> => extracting sha256:03a691138ef8873ae8f244bfae84a8f425d6d970d9eb9e02025f09cf3e6ff73e 0.0s
=> => extracting sha256:9684ed3c71177bfaf2b3dd14a7f4f396e574be10c175a8eaafeddf71db897ca6 0.0s
=> => extracting sha256:c79b46181ee684152af9c05fbf145fd65a879155efe3656904c6782f738ce5a2 0.0s
=> => extracting sha256:91b6a7918caa109af3526868ecd34dd58368c1612f97b09c87d547fc550670c2 0.0s
=> => extracting sha256:d3ed2fbe7c334026828edfa722cc573ae82bf47aa8f3796d048a4e45c0021743 2.2s
=> => extracting sha256:79a28dd493566ad8f77a68bdb8827f450b8e08f6e8cbc0a97fd07b0d9f3e5f59 3.9s
=> => extracting sha256:02148fc97997080a40525c593dd6412858d7daf2ebc3fcfd31d113f3c2ce9fff 0.0s
=> => extracting sha256:6471a4765a47fc71de42baa86e1da4c9c6cfbd2cd60d683175b12602ae342061 17.4s
=> => extracting sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1 0.0s
=> => extracting sha256:824e6993c2a6f049af207439226b8f6cea693d6ef5f8751e52dcff31997451c0 4.0s
=> => extracting sha256:bcd2dfdccd094718505d87d069d9c0682d8d6285ba39bc5ab7b72a1281d63075 0.8s
=> => extracting sha256:499d104c2b1eb9e3bf33835377ec125a657b5f55e42f0ea88d25a92426dff428 1.4s
------
> [1/5] FROM nvcr.io/nvidia/pytorch:22.12-py3@sha256:09a80f272dd173c9d8f28c23a1985aebe2bd3edd41a184ee9634f6e3f8a1f63d:
------
Dockerfile:2
--------------------
1 | ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:22.12-py3
2 | >>> FROM ${FROM_IMAGE_NAME}
3 |
4 | ENV DEBIAN_FRONTEND=noninteractive
--------------------
ERROR: failed to solve: failed to register layer: write /usr/local/lib/python3.8/dist-packages/cmake/data/bin/ctest: no space left on device
dcg@oq1:/mnt/nvme1n1/mlperf/ubuntu/training/stable_diffusion$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 13G 3.0M 13G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 98G 84G 9.7G 90% /
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda2 2.0G 217M 1.6G 12% /boot
/dev/sda1 1.1G 6.1M 1.1G 1% /boot/efi
tmpfs 13G 8.0K 13G 1% /run/user/1000
/dev/nvme1n1 28T 288G 28T 2% /mnt/nvme1n1
dcg@oq1:/mnt/nvme1n1/mlperf/ubuntu/training/stable_diffusion$
The base docker image nvcr.io/nvidia/pytorch:22.12-py3
is over 18[GB], you can use docker info
to check where docker stores theimages (/var/lib/docker/overlay2
on Debian based systems) but I can see you have only 9.7[GB] under your root folder.
@gaowayne if your issue is resolved, can you please close this?
Closing due to inactivity. Please create a new issue if the problem persists