build icon indicating copy to clipboard operation
build copied to clipboard

Out of disk space on dockerized shared hosts

Open richardlau opened this issue 4 years ago • 31 comments

Reported via Slack:

From @Trott :

Example

10:51:23  > git config remote.origin.url [email protected]:nodejs/node.git # timeout=10
10:51:23 ERROR: Error fetching remote repo 'origin'
10:51:23 hudson.plugins.git.GitException: Failed to fetch from [email protected]:nodejs/node.git
10:51:23 	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:996)
10:51:23 	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1237)

Different example

11:01:46 collect2: fatal error: ld terminated with signal 9 [Killed]
11:01:46 compilation terminated.
11:01:46 cctest.target.mk:231: recipe for target '/home/iojs/build/workspace/node-test-commit-linux-> containered/out/Debug/cctest' failed
11:01:46 make[2]: *** [/home/iojs/build/workspace/node-test-commit-linux-containered/out/Debug/cctest] Error 1
11:01:46 make[2]: *** Waiting for unfinished jobs....
11:02:58 rm fa309b4689a758e9e8a16895fbbf2b4922a45c96.intermediate
11:02:58 Makefile:104: recipe for target 'node_g' failed
11:02:58 make[1]: *** [node_g] Error 2
11:02:58 Makefile:530: recipe for target 'build-ci' failed
11:02:58 make: *** [build-ci] Error 2

From @danielleadams :

I’m running into some issues with the pull-request jobs today (releasing 15.4.0). Different tests seem to be raising the same error with git, and they are inconsistently failing: https://ci.nodejs.org/job/node-test-commit-linux-containered/nodes=ubuntu1804_sharedlibs_debug_x64/23997/console https://ci.nodejs.org/job/node-test-commit-linux-containered/nodes=ubuntu1804_sharedlibs_withoutssl_x64/23997/console https://ci.nodejs.org/job/node-test-commit-linux-containered/nodes=ubuntu1804_sharedlibs_openssl110_x64/23999/console https://ci.nodejs.org/job/node-test-commit-linux-containered/nodes=ubuntu1804_sharedlibs_openssl111_x64/24008/ https://ci.nodejs.org/job/node-test-commit-linux-containered/nodes=ubuntu1804_sharedlibs_withoutssl_x64/24008/console https://ci.nodejs.org/job/node-test-commit-linux/nodes=alpine-latest-x64/38744/console Does anyone know how to address this?

richardlau avatar Dec 08 '20 19:12 richardlau

It looks like the disks are full. e.g. from https://ci.nodejs.org/job/node-test-commit-linux-containered/24011/nodes=ubuntu1804_sharedlibs_withoutssl_x64/console the relevant lines are:

18:51:23 Caused by: hudson.plugins.git.GitException: Command "git config remote.origin.url [email protected]:nodejs/node.git" returned status code 4:
18:51:23 stdout: 
18:51:23 stderr: error: failed to write new configuration file /home/iojs/build/workspace/node-test-commit-linux-containered/.git/config.lock
18:51:23 
...
18:51:23 FATAL: Unable to produce a script file
18:51:23 java.io.IOException: No space left on device
18:51:23 	at java.io.UnixFileSystem.createFileExclusively(Native Method)
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   94G     0 100% /
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# du -hs test-softlayer-*
4.0G	test-softlayer-alpine310_container-x64-1
2.3G	test-softlayer-alpine311_container-x64-1
2.3G	test-softlayer-alpine312_container-x64-1
3.7G	test-softlayer-alpine39_container-x64-1
2.2G	test-softlayer-ubi81_container-x64-1
2.5G	test-softlayer-ubuntu1604_arm_cross_container-x64-1
6.6G	test-softlayer-ubuntu1804_arm_cross_container-x64-1
208M	test-softlayer-ubuntu1804_container-x64-1
2.1G	test-softlayer-ubuntu1804_sharedlibs_container-x64-1
16G	test-softlayer-ubuntu1804_sharedlibs_container-x64-2
14G	test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.3G	test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.5G	test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs#

I'll try removing the 14G/16G workspaces.

richardlau avatar Dec 08 '20 19:12 richardlau

I've cleaned up the workspaces for test-softlayer-ubuntu1804_sharedlibs_container-x64-2 and test-softlayer-ubuntu1804_sharedlibs_container-x64-3 to free up 30G of space.

richardlau avatar Dec 08 '20 19:12 richardlau

feel free to clean out all of the workspaces, it's not a huge slowdown to recreate them here (like it is on the Pi hosts)

rvagg avatar Dec 09 '20 03:12 rvagg

Out of space again this morning:

root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# df .
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/xvda2     102821812 97779460    582800 100% /
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# du -hs test-softlayer-*
4.0G    test-softlayer-alpine310_container-x64-1
1.6G    test-softlayer-alpine311_container-x64-1
2.3G    test-softlayer-alpine312_container-x64-1
3.7G    test-softlayer-alpine39_container-x64-1
2.2G    test-softlayer-ubi81_container-x64-1
2.5G    test-softlayer-ubuntu1604_arm_cross_container-x64-1
6.6G    test-softlayer-ubuntu1804_arm_cross_container-x64-1
208M    test-softlayer-ubuntu1804_container-x64-1
1.8G    test-softlayer-ubuntu1804_sharedlibs_container-x64-1
15G     test-softlayer-ubuntu1804_sharedlibs_container-x64-2
16G     test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.3G    test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.4G    test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs#

So the 15G workpaces are back and look to be Debug builds. I don't know what the typical expected workspace size is -- whether the sizes have been creeping up slowly over time or if there's been a recent change to cause a jump.

FWIW

root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace# du -hs *
16G     node-test-commit-linux-containered
4.0K    node-test-commit-linux-containered@tmp
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace# cd node-test-commit-linux-containered
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered# du -hs *
120K    AUTHORS
4.0K    BSDmakefile
32K     BUILDING.md
56K     CHANGELOG.md
4.0K    CODE_OF_CONDUCT.md
4.0K    CONTRIBUTING.md
8.0K    GOVERNANCE.md
84K     LICENSE
48K     Makefile
32K     README.md
4.0K    SECURITY.md
4.0K    android-configure
1.5M    benchmark
4.0K    codecov.yml
20K     common.gypi
4.0K    config.gypi
4.0K    config.mk
4.0K    config.status
4.0K    configure
72K     configure.py
56K     configure.pyc
386M    deps
9.9M    doc
4.0K    env.properties
4.0K    glossary.md
72K     icu_config.gypi
2.9M    lib
0       node
52K     node.gyp
12K     node.gypi
0       node_g
16K     onboarding.md
15G     out
4.9M    src
46M     test
248K    test.tap
35M     tools
32K     vcbuild.bat
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered# cd out/
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered/out# du -hs *
14G     Debug
36K     Makefile
854M    Release
12K     cctest.target.mk
156K    deps
12K     embedtest.target.mk
8.0K    fuzz_env.target.mk
8.0K    fuzz_url.target.mk
240K    junit
40K     libnode.target.mk
12K     mkcodecache.target.mk
16K     node.target.mk
4.0K    node_dtrace_header.target.mk
4.0K    node_dtrace_provider.target.mk
4.0K    node_dtrace_ustack.target.mk
4.0K    node_etw.target.mk
12K     node_mksnapshot.target.mk
4.0K    node_text_start.target.mk
4.0K    overlapped-checker.target.mk
4.0K    specialize_node_d.target.mk
508K    tools
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered/out# cd Debug/
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered/out/Debug# du -hs *
1.7M    bytecode_builtins_list_generator
1.1G    cctest
1.1G    embedtest
34M     gen-regexp-special-case
15M     genccode
16M     icupkg
1.1G    mkcodecache
1.4G    mksnapshot
1.1G    node
1.1G    node_mksnapshot
154M    obj
62M     obj.host
6.5G    obj.target
9.9M    openssl-cli
16K     overlapped-checker
39M     torque
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered/out/Debug# ls -alh
total 6.8G
drwxr-xr-x  6 iojs iojs 4.0K Dec  8 19:53 .
drwxr-xr-x  7 iojs iojs 4.0K Dec  8 20:03 ..
drwxr-xr-x  3 iojs iojs 4.0K Dec  8 19:50 .deps
-rwxr-xr-x  1 iojs iojs 1.7M Dec  8 19:38 bytecode_builtins_list_generator
-rwxr-xr-x  1 iojs iojs 1.1G Dec  8 19:53 cctest
-rwxr-xr-x  1 iojs iojs 1.1G Dec  8 19:53 embedtest
-rwxr-xr-x  1 iojs iojs  34M Dec  8 19:40 gen-regexp-special-case
-rwxr-xr-x  1 iojs iojs  15M Dec  8 19:38 genccode
-rwxr-xr-x  1 iojs iojs  16M Dec  8 19:38 icupkg
-rwxr-xr-x  1 iojs iojs 1.1G Dec  8 19:53 mkcodecache
-rwxr-xr-x  1 iojs iojs 1.4G Dec  8 19:50 mksnapshot
-rwxr-xr-x  1 iojs iojs 1.1G Dec  8 19:54 node
-rwxr-xr-x  1 iojs iojs 1.1G Dec  8 19:53 node_mksnapshot
drwxr-xr-x  3 iojs iojs 4.0K Dec  8 19:37 obj
drwxr-xr-x  6 iojs iojs 4.0K Dec  8 19:38 obj.host
drwxr-xr-x 39 iojs iojs 4.0K Dec  8 19:53 obj.target
-rwxr-xr-x  1 iojs iojs 9.9M Dec  8 19:38 openssl-cli
-rwxr-xr-x  1 iojs iojs  16K Dec  8 19:37 overlapped-checker
-rwxr-xr-x  1 iojs iojs  39M Dec  8 19:38 torque
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered/out/Debug#

richardlau avatar Dec 09 '20 12:12 richardlau

I was waiting for the inflight https://ci.nodejs.org/job/node-test-commit-linux-containered/ jobs to complete before manually removing things but it looks like the two most recent builds passed. The current disk space usage with nothing running looks like this:

root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   82G   13G  87% /
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# du -hs *
4.0G    test-softlayer-alpine310_container-x64-1
1.6G    test-softlayer-alpine311_container-x64-1
2.3G    test-softlayer-alpine312_container-x64-1
3.7G    test-softlayer-alpine39_container-x64-1
2.2G    test-softlayer-ubi81_container-x64-1
2.5G    test-softlayer-ubuntu1604_arm_cross_container-x64-1
6.6G    test-softlayer-ubuntu1804_arm_cross_container-x64-1
208M    test-softlayer-ubuntu1804_container-x64-1
1.8G    test-softlayer-ubuntu1804_sharedlibs_container-x64-1
2.3G    test-softlayer-ubuntu1804_sharedlibs_container-x64-2
16G     test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.3G    test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.4G    test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs#

i.e. we only have one 16G (i.e. Debug build) workspace. We're probably seeing flaky behaviour if more than one https://ci.nodejs.org/job/node-test-commit-linux-containered/ is running and we end up with two inflight Debug builds which fills up the disk.

richardlau avatar Dec 09 '20 13:12 richardlau

Disk full again, with 15G workspaces on two hosts (suggesting we had concurrent debug builds again):

root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   94G  298M 100% /
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# du -hs *
4.0G    test-softlayer-alpine310_container-x64-1
1.6G    test-softlayer-alpine311_container-x64-1
2.3G    test-softlayer-alpine312_container-x64-1
3.7G    test-softlayer-alpine39_container-x64-1
2.2G    test-softlayer-ubi81_container-x64-1
2.5G    test-softlayer-ubuntu1604_arm_cross_container-x64-1
6.6G    test-softlayer-ubuntu1804_arm_cross_container-x64-1
208M    test-softlayer-ubuntu1804_container-x64-1
1.8G    test-softlayer-ubuntu1804_sharedlibs_container-x64-1
15G     test-softlayer-ubuntu1804_sharedlibs_container-x64-2
16G     test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.3G    test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.4G    test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:

feel free to clean out all of the workspaces, it's not a huge slowdown to recreate them here (like it is on the Pi hosts)

I've gone and wiped all the workspaces.

root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# ls -1 | xargs -i bash -c "rm -rf {}/build/workspace/*"
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# du -hs *
162M    test-softlayer-alpine310_container-x64-1
199M    test-softlayer-alpine311_container-x64-1
191M    test-softlayer-alpine312_container-x64-1
146M    test-softlayer-alpine39_container-x64-1
218M    test-softlayer-ubi81_container-x64-1
173M    test-softlayer-ubuntu1604_arm_cross_container-x64-1
173M    test-softlayer-ubuntu1804_arm_cross_container-x64-1
208M    test-softlayer-ubuntu1804_container-x64-1
271M    test-softlayer-ubuntu1804_sharedlibs_container-x64-1
277M    test-softlayer-ubuntu1804_sharedlibs_container-x64-2
269M    test-softlayer-ubuntu1804_sharedlibs_container-x64-3
270M    test-softlayer-ubuntu1804_sharedlibs_container-x64-4
265M    test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   38G   57G  40% /
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs#

richardlau avatar Dec 10 '20 16:12 richardlau

Cleaned up test-softlayer-ubuntu1804-docker-x64-1 again: before:

root@test-softlayer-ubuntu1804-docker-x64-1:~# df -h /home/iojs/
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   94G   97M 100% /
root@test-softlayer-ubuntu1804-docker-x64-1:~# du -hs /home/iojs/*
162M    /home/iojs/test-softlayer-alpine310_container-x64-1
1.8G    /home/iojs/test-softlayer-alpine311_container-x64-1
2.1G    /home/iojs/test-softlayer-alpine312_container-x64-1
146M    /home/iojs/test-softlayer-alpine39_container-x64-1
2.2G    /home/iojs/test-softlayer-ubi81_container-x64-1
2.0G    /home/iojs/test-softlayer-ubuntu1604_arm_cross_container-x64-1
11G     /home/iojs/test-softlayer-ubuntu1804_arm_cross_container-x64-1
246M    /home/iojs/test-softlayer-ubuntu1804_container-x64-1
2.4G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-1
17G     /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-2
16G     /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.5G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.3G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:~#

richardlau avatar Mar 24 '21 14:03 richardlau

Any chance something similar is going on with https://github.com/nodejs/build/issues/2588?

Trott avatar Mar 24 '21 20:03 Trott

Possibly? Maybe bring test-softlayer-ubuntu1804_sharedlibs_container-x64-3 back online and see if builds still fail on it?

richardlau avatar Mar 24 '21 21:03 richardlau

Hmmm, looks like it's already back online.

Trott avatar Mar 25 '21 04:03 Trott

probably related to an ld failure, likely all goes back to #2573

rvagg avatar Mar 25 '21 05:03 rvagg

@richardlau is the cleanup that originated this issues something that we might enable build helpers to be able to do?

mhdawson avatar Mar 31 '21 21:03 mhdawson

@richardlau is the cleanup that originated this issues something that we might enable build helpers to be able to do?

Yes, it would be a good candidate.

richardlau avatar Mar 31 '21 22:03 richardlau

We should add it to a list somewhere, @AshCripps do you have anything like that created ?

mhdawson avatar Mar 31 '21 22:03 mhdawson

FWIW I've kept this issue open as the underlying issue is that we run multiple containers (5 at the current time) on each docker host and typically we run into problems when two of the containers on the same host are running debug builds. I think we've only seen it happen on the softlayer host although whether that is due to how Jenkins schedules across all the containers or possibly the softlayer host has less available disk space than the two digitalocean hosts (I'll check the disk sizes tomorrow).

richardlau avatar Mar 31 '21 22:03 richardlau

FWIW re. available disk space:

$ ansible -m shell -a "df -h /home/iojs" "*_docker-*x64*"
test-digitalocean-ubuntu1804_docker-x64-1 | CHANGED | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       194G  137G   58G  71% /
test-digitalocean-ubuntu1804_docker-x64-2 | CHANGED | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       194G  142G   53G  73% /
test-softlayer-ubuntu1804_docker-x64-1 | CHANGED | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   80G   15G  85% /

So it looks like the SoftLayer host has half the storage compared to the two Digital Ocean hosts.

richardlau avatar Apr 01 '21 12:04 richardlau

image

(Joyent hosts are expected to be offline.)

iojs@test-digitalocean-ubuntu1804-docker-x64-1:~$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       194G  194G     0 100% /
iojs@test-digitalocean-ubuntu1804-docker-x64-1:~$ du -hs /home/iojs/test-digitalocean*
6.0G    /home/iojs/test-digitalocean-alpine310_container-x64-1
2.4G    /home/iojs/test-digitalocean-alpine311_container-x64-1
2.4G    /home/iojs/test-digitalocean-alpine312_container-x64-1
3.9G    /home/iojs/test-digitalocean-alpine39_container-x64-1
2.7G    /home/iojs/test-digitalocean-ubi81_container-x64-1
3.4G    /home/iojs/test-digitalocean-ubuntu1604_arm_cross_container-x64-1
264M    /home/iojs/test-digitalocean-ubuntu1604_container-x64-1
7.0G    /home/iojs/test-digitalocean-ubuntu1804_arm_cross_container-x64-1
264M    /home/iojs/test-digitalocean-ubuntu1804_container-x64-1
du: cannot read directory '/home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-1/node-tmp/.tmp.2022qxobQR/middle': Permission denied
15G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-1
19G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-3
du: cannot read directory '/home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-5/node-tmp/.tmp.20220cfyNU/middle': Permission denied
15G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-5
du: cannot read directory '/home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-7/node-tmp/.tmp.2022wqQuzr/middle': Permission denied
15G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-7
4.5G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-9
iojs@test-digitalocean-ubuntu1804-docker-x64-1:~$

i.e. four debug builds

iojs@test-softlayer-ubuntu1804-docker-x64-1:~$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   76G   19G  81% /
iojs@test-softlayer-ubuntu1804-docker-x64-1:~$ du -hs /home/iojs/test-softlayer*
2.6G    /home/iojs/test-softlayer-alpine311_container-x64-1
2.6G    /home/iojs/test-softlayer-alpine312_container-x64-1
2.5G    /home/iojs/test-softlayer-ubi81_container-x64-1
2.4G    /home/iojs/test-softlayer-ubuntu1604_arm_cross_container-x64-1
13G     /home/iojs/test-softlayer-ubuntu1804_arm_cross_container-x64-1
264M    /home/iojs/test-softlayer-ubuntu1804_container-x64-1
2.4G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-1
4.3G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-2
327M    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3
340M    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-4
du: cannot read directory '/home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-5/tmp/.tmp.2183JNkJXS/middle': Permission denied
12G     /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-5
iojs@test-softlayer-ubuntu1804-docker-x64-1:~$

Maybe Jenkins hasn't picked up on the space freed in https://github.com/nodejs/build/issues/2611?

And for completeness

iojs@test-digitalocean-ubuntu1804-docker-x64-2:~$ df -h /home/iojs/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       194G  142G   53G  74% /
iojs@test-digitalocean-ubuntu1804-docker-x64-2:~$ du -hs /home/iojs/test-digitalocean-*
4.1G    /home/iojs/test-digitalocean-alpine310_container-x64-2
4.4G    /home/iojs/test-digitalocean-alpine311_container-x64-2
4.6G    /home/iojs/test-digitalocean-alpine312_container-x64-2
154M    /home/iojs/test-digitalocean-alpine39_container-x64-2
du: cannot read directory '/home/iojs/test-digitalocean-ubi81_container-x64-2/node-tmp/.tmp.2022GSfSJR/middle': Permission denied
4.4G    /home/iojs/test-digitalocean-ubi81_container-x64-2
4.2G    /home/iojs/test-digitalocean-ubuntu1604_arm_cross_container-x64-2
264M    /home/iojs/test-digitalocean-ubuntu1604_container-x64-2
3.1G    /home/iojs/test-digitalocean-ubuntu1804_arm_cross_container-x64-2
264M    /home/iojs/test-digitalocean-ubuntu1804_container-x64-2
du: cannot read directory '/home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-10/node-tmp/.tmp.2022n8C2fY/middle': Permission denied
1.9G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-10
2.0G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-2
4.1G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-4
2.1G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-6
3.7G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-8
iojs@test-digitalocean-ubuntu1804-docker-x64-2:~$

richardlau avatar Apr 07 '21 10:04 richardlau

Cleared the workspaces from test-digitalocean-ubuntu1804-docker-x64-1. Eventually Jenkins reenabled the containers on that host and the softlayer one.

richardlau avatar Apr 07 '21 10:04 richardlau

The debug builds are so much bigger than the release builds that we might want to rethink our container strategy... perhaps we could get away with reducing the number of sharedlibs containers on each docker host from five to four and having a dedicated container for debug builds? That would prevent having multiple debug builds running at the same time on any single docker host. It would cut the number of available executors for the debug builds from fifteen down to three but we do not have the disk capacity to run five debug builds (absolute worst case scenario in the current setup) on any of our docker hosts.

richardlau avatar Apr 07 '21 10:04 richardlau

yeah, that's not a bad idea. We could also do post-build cleanup, I don't think we have that enabled for these builds.

rvagg avatar Apr 07 '21 10:04 rvagg

Post build cleanup is an option but wouldn't prevent disk space issues for multiple debug builds in progress on the same docker host at the same time (but would at least recover the space for the next builds).

I'll add separating out the debug builds into its own container to my list of things to do.

richardlau avatar Apr 07 '21 10:04 richardlau

All the softlayer containers were automatically marked offline in Jenkins due to low disk space. FTR:

root@test-softlayer-ubuntu1804-docker-x64-1:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  1.6M  3.2G   1% /run
/dev/xvda2       99G   94G   96M 100% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/xvda1      240M  104M  124M  46% /boot
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/b5f1237849a699cab8de71d79c8f93057f40a3f4c2a5e471839461180f54a566/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/6b3a9c2755a5e07741d33a049cccdc5379257b8bc14fd1b61cac388e45e943af/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/8ee005980aa672b8043512e5a725fc028030c6c8d1b9c136198640c623936538/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/ab4133c96644f6a0a829e4b0cb76ffe33f8d838cb0458843fc6dc4f85e981f5d/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/6e122db71282eebedf790d6f13111c63480e1bb8388ba89fbf1b7619609d8a3b/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/1a5b621e6e2d415a4a20ef0595ca1f66a42c20dcab15e7535a663e7b489e7916/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/0647366017fc8137ba120a6d7db85e9737ae090a96fdcbe946d75d89ff1098fd/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/0b96411bd91465cdce13d8a47758995f0c066a341e4f77c1c7b80366cb653ecb/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/c542054ec9703c9d5e7d42c23c0dbf6a3469a487370b83ccd0aceb238192a003/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/32e66c944d9d650f4913ff7001647ce82ded462d6417db6e1cbcb166ff484199/merged
overlay          99G   94G   96M 100% /var/lib/docker/overlay2/c7f8c6a9c3790ace1b8ebcbc3f3e5409d02ce730f950e261db1a58f88c69b382/merged
tmpfs           3.2G     0  3.2G   0% /run/user/0
root@test-softlayer-ubuntu1804-docker-x64-1:~# cd /home/iojs
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs# du -hs test-softlayer-*/build/workspace/*
2.3G    test-softlayer-alpine311_container-x64-1/build/workspace/node-test-commit-linux
1.2G    test-softlayer-alpine311_container-x64-1/build/workspace/node-test-commit-linux-richardlau
2.3G    test-softlayer-alpine312_container-x64-1/build/workspace/node-test-commit-linux
2.2G    test-softlayer-ubi81_container-x64-1/build/workspace/node-test-commit-linux-containered
4.0K    test-softlayer-ubi81_container-x64-1/build/workspace/node-test-commit-linux-containered@tmp
2.2G    test-softlayer-ubuntu1604_arm_cross_container-x64-1/build/workspace/node-cross-compile
20K     test-softlayer-ubuntu1604_arm_cross_container-x64-1/build/workspace/node-cross-compile@tmp
7.7G    test-softlayer-ubuntu1804_arm_cross_container-x64-1/build/workspace/node-cross-compile
32K     test-softlayer-ubuntu1804_arm_cross_container-x64-1/build/workspace/node-cross-compile@tmp
9.5G    test-softlayer-ubuntu1804_sharedlibs_container-x64-1/build/workspace/node-test-commit-linux-containered
4.0K    test-softlayer-ubuntu1804_sharedlibs_container-x64-1/build/workspace/node-test-commit-linux-containered@tmp
1.7G    test-softlayer-ubuntu1804_sharedlibs_container-x64-2/build/workspace/node-test-commit-linux-containered
4.0K    test-softlayer-ubuntu1804_sharedlibs_container-x64-2/build/workspace/node-test-commit-linux-containered@tmp
4.0K    test-softlayer-ubuntu1804_sharedlibs_container-x64-3/build/workspace/node-test-commit-linux-containered@tmp
18G     test-softlayer-ubuntu1804_sharedlibs_container-x64-4/build/workspace/node-test-commit-linux-containered
4.0K    test-softlayer-ubuntu1804_sharedlibs_container-x64-4/build/workspace/node-test-commit-linux-containered@tmp
2.3G    test-softlayer-ubuntu1804_sharedlibs_container-x64-5/build/workspace/node-test-commit-linux-containered
4.0K    test-softlayer-ubuntu1804_sharedlibs_container-x64-5/build/workspace/node-test-commit-linux-containered@tmp
root@test-softlayer-ubuntu1804-docker-x64-1:/home/iojs#

Cleared test-softlayer-ubuntu1804_sharedlibs_container-x64-1/build/workspace/node-test-commit-linux-containered and test-softlayer-ubuntu1804_sharedlibs_container-x64-4/build/workspace/node-test-commit-linux-containered and the hosts (eventually) enabled themselves again.

I've also gone and removed the ubuntu1804_sharedlibs_debug_x64 label in Jenkins from four of the containers at softlayer (*-2 to *-5) meaning that only test-softlayer-ubuntu1804_sharedlibs_container-x64-1 will schedule debug builds. This seems a quick way to preventing multiple debug builds on the softlayer host -- will need to keep a look out to see if that causes issues for the digital ocean containers. If it does I'll look at the suggestion I made in https://github.com/nodejs/build/issues/2494#issuecomment-814803868.

richardlau avatar May 10 '21 13:05 richardlau

Softlayer docker host is out of space again, reported in https://github.com/nodejs/build/issues/2664 and https://github.com/nodejs/build/issues/2665.

root@test-softlayer-ubuntu1804-docker-x64-1:~# du -hs /home/iojs/*
0       /home/iojs/
2.5G    /home/iojs/test-softlayer-alpine311_container-x64-1
2.5G    /home/iojs/test-softlayer-alpine312_container-x64-1
2.5G    /home/iojs/test-softlayer-ubi81_container-x64-1
269M    /home/iojs/test-softlayer-ubuntu1604_arm_cross_container-x64-1
33G     /home/iojs/test-softlayer-ubuntu1804_arm_cross_container-x64-1
290M    /home/iojs/test-softlayer-ubuntu1804_container-x64-1
2.3G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-1
2.4G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-2
2.5G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.6G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.5G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:~#

33G for /home/iojs/test-softlayer-ubuntu1804_arm_cross_container-x64-1 being the outlier here.

richardlau avatar Jun 02 '21 11:06 richardlau

I've removed the /home/iojs/test-softlayer-ubuntu1804_arm_cross_container-x64-1/build/workspace/node-cross-compile directory.

root@test-softlayer-ubuntu1804-docker-x64-1:~# du -hs /home/iojs/*
0       /home/iojs/
2.5G    /home/iojs/test-softlayer-alpine311_container-x64-1
2.5G    /home/iojs/test-softlayer-alpine312_container-x64-1
2.5G    /home/iojs/test-softlayer-ubi81_container-x64-1
269M    /home/iojs/test-softlayer-ubuntu1604_arm_cross_container-x64-1
269M    /home/iojs/test-softlayer-ubuntu1804_arm_cross_container-x64-1
290M    /home/iojs/test-softlayer-ubuntu1804_container-x64-1
2.3G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-1
2.4G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-2
2.5G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-3
2.6G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-4
2.5G    /home/iojs/test-softlayer-ubuntu1804_sharedlibs_container-x64-5
root@test-softlayer-ubuntu1804-docker-x64-1:~#

richardlau avatar Jun 02 '21 11:06 richardlau

test-softlayer-ubuntu1804_docker-x64-1 was out of space. The test-softlayer-ubuntu1804_arm_cross_container-x64-1 workspace on was 27G -- I've deleted it.

richardlau avatar Jul 29 '21 13:07 richardlau

@richardlau is this something that occurs often and would it be a good candidate to add to the AWX jobs build helpers can run ?

mhdawson avatar Jul 30 '21 21:07 mhdawson

There softlayer containers were offline at the beginning of this month (https://github.com/nodejs/build/issues/2803) and that was due to the .git folder in the arm cross-compile workspaces not being cleaned/pruned.

I had to wipe out the workspaces on the softlayer containers on Tuesday as we were out of space again. This didn't seem related to https://github.com/nodejs/build/issues/2803 (i.e. the cross-compile workspaces seemed to be a reasonable size).

Today the containers on test-digitalocean-ubuntu1804_docker-x64-2 are offline for space reasons: image

I'll investigate later this afternoon (I have a medical appointment to attend first).

richardlau avatar Nov 25 '21 12:11 richardlau

Current test-digitalocean-ubuntu1804-docker-x64-2 space usage:

root@test-digitalocean-ubuntu1804-docker-x64-2:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  324M  2.9G  11% /run
/dev/vda1       194G  194G     0 100% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vda15      105M  6.7M   98M   7% /boot/efi
overlay         194G  194G     0 100% /var/lib/docker/overlay2/9bd8b27d2bb7e293a55a1b91cccf15093602dc839ce49a67cb4f7ca221028b66/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/b906535907d92563295f4345c2ffffc3c762fc67848849d1e869aa2e40ed2889/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/c008e6cf0135cd79f85f14fd4aa46c0f4e1f67b4439ee007b9161c4b9f4d4265/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/9cc63648a6ba2d48ca6753210dc0959b6e45aea6277cb010251b158f61a2b584/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/5cac407aba5a15f229addd9904605df7fbad3a241144e23b57e45965cd51c390/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/cb19d48ebf158b985f06438c60e1e39a89c8c6aa51ab8ed7407173c8b9cef984/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/508bb2ca92eaa09f49f102d4215f3f37f3cbfa2d018d4fc9257d2a2e9c79088e/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/2de854f1186395900422662e192607148094b383462ba8d7b56149fe84dd3b28/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/2052de449009e2d9f9f17c7cfc8b43bfd981a2eeb956f816f09ccb859a216032/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/37e866d149ac586cbf5f57000fc8f8263833336b66f791b6a77378907aa8a1df/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/36320a9f453d1da6b0e498d13b03853e794b1e21ed39da41c2ee6054df3fabdd/merged
overlay         194G  194G     0 100% /var/lib/docker/overlay2/026ab3aa9261e923ae35edb2ae5ac315cc444da187ef24eb23c77e49cbd82224/merged
tmpfs           3.2G     0  3.2G   0% /run/user/0
root@test-digitalocean-ubuntu1804-docker-x64-2:~# du -hs /home/iojs/.ccache /home/iojs/*
39G     /home/iojs/.ccache
17M     /home/iojs/jenkins_diagnostics.txt
197M    /home/iojs/remoting
852K    /home/iojs/slave.jar
2.9G    /home/iojs/test-digitalocean-alpine311_container-x64-2
2.7G    /home/iojs/test-digitalocean-alpine312_container-x64-2
2.5G    /home/iojs/test-digitalocean-ubi81_container-x64-2
2.8G    /home/iojs/test-digitalocean-ubuntu1604_arm_cross_container-x64-2
340M    /home/iojs/test-digitalocean-ubuntu1604_container-x64-2
3.1G    /home/iojs/test-digitalocean-ubuntu1804_arm_cross_container-x64-2
340M    /home/iojs/test-digitalocean-ubuntu1804_container-x64-2
23G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-10
23G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-2
23G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-4
23G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-6
23G     /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-8
11M     /home/iojs/tools
4.0K    /home/iojs/workspace
root@test-digitalocean-ubuntu1804-docker-x64-2:~#

richardlau avatar Nov 25 '21 14:11 richardlau

The obvious outlier is that it looks like all 5 sharedlibs_containers were trying to build debug builds at the same time. I think this has been exacerbated by the increase in build times (being addressed by https://github.com/nodejs/node/pull/40934) which is leading to builds being queued up and keeping all of the containers busy.

I've run a git clean -fdX in all the workspaces to claim back space (removing the workspace directories entirely would also work to claim back space but that tends to lead to the first jobs that run failing to resolve refs/remotes/origin/_jenkins_local_branch on first use):

root@test-digitalocean-ubuntu1804-docker-x64-2:~# find /home/iojs/*/build/workspace/* -type d -prune -not -name *@tmp -exec sh -c "cd {} && git clean -fdX" \;
...
root@test-digitalocean-ubuntu1804-docker-x64-2:~# du -hs /home/iojs/.ccache /home/iojs/*
39G     /home/iojs/.ccache
17M     /home/iojs/jenkins_diagnostics.txt
197M    /home/iojs/remoting
852K    /home/iojs/slave.jar
1.8G    /home/iojs/test-digitalocean-alpine311_container-x64-2
1.7G    /home/iojs/test-digitalocean-alpine312_container-x64-2
1.8G    /home/iojs/test-digitalocean-ubi81_container-x64-2
2.3G    /home/iojs/test-digitalocean-ubuntu1604_arm_cross_container-x64-2
340M    /home/iojs/test-digitalocean-ubuntu1604_container-x64-2
2.3G    /home/iojs/test-digitalocean-ubuntu1804_arm_cross_container-x64-2
340M    /home/iojs/test-digitalocean-ubuntu1804_container-x64-2
1.9G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-10
1.8G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-2
1.9G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-4
2.0G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-6
1.9G    /home/iojs/test-digitalocean-ubuntu1804_sharedlibs_container-x64-8
11M     /home/iojs/tools
4.0K    /home/iojs/workspace
root@test-digitalocean-ubuntu1804-docker-x64-2:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs           3.2G  1.6M  3.2G   1% /run
/dev/vda1       194G   88G  106G  46% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vda15      105M  6.7M   98M   7% /boot/efi
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/9bd8b27d2bb7e293a55a1b91cccf15093602dc839ce49a67cb4f7ca221028b66/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/b906535907d92563295f4345c2ffffc3c762fc67848849d1e869aa2e40ed2889/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/c008e6cf0135cd79f85f14fd4aa46c0f4e1f67b4439ee007b9161c4b9f4d4265/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/9cc63648a6ba2d48ca6753210dc0959b6e45aea6277cb010251b158f61a2b584/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/5cac407aba5a15f229addd9904605df7fbad3a241144e23b57e45965cd51c390/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/cb19d48ebf158b985f06438c60e1e39a89c8c6aa51ab8ed7407173c8b9cef984/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/508bb2ca92eaa09f49f102d4215f3f37f3cbfa2d018d4fc9257d2a2e9c79088e/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/2de854f1186395900422662e192607148094b383462ba8d7b56149fe84dd3b28/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/2052de449009e2d9f9f17c7cfc8b43bfd981a2eeb956f816f09ccb859a216032/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/37e866d149ac586cbf5f57000fc8f8263833336b66f791b6a77378907aa8a1df/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/36320a9f453d1da6b0e498d13b03853e794b1e21ed39da41c2ee6054df3fabdd/merged
overlay         194G   88G  106G  46% /var/lib/docker/overlay2/026ab3aa9261e923ae35edb2ae5ac315cc444da187ef24eb23c77e49cbd82224/merged
tmpfs           3.2G     0  3.2G   0% /run/user/0
root@test-digitalocean-ubuntu1804-docker-x64-2:~#

I'll try and find some time to implement the separation of the debug builds (ref: https://github.com/libuv/libuv/issues/3349#issuecomment-957994866).

richardlau avatar Nov 25 '21 16:11 richardlau

@mhdawson turned off the x64 debug builds for master (Node.js 18 onwards) (https://github.com/nodejs/build/issues/2837#issuecomment-999824238) which has alleviated the disk space pressure somewhat.

There remains a discrepancy between the available disk space to the SoftLayer (IBM) host vs the two Digital Ocean ones (https://github.com/nodejs/build/issues/2494#issuecomment-811858589). e.g.

$ ssh test-digitalocean-ubuntu1804_docker-x64-1 df -h /home/iojs
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       194G  106G   89G  55% /
$ ssh test-digitalocean-ubuntu1804_docker-x64-2 df -h /home/iojs
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       194G   98G   97G  51% /
$ ssh test-softlayer-ubuntu1804_docker-x64-1 df -h /home/iojs
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       99G   79G   16G  84% /
$

I think it makes sense to bump the storage on the SoftLayer machine to 200GB (I think @mhdawson and I may have had this conversation some time ago). I believe on IBM Cloud this is done by adding "Portable storage". We currently have image The recommendation is that the portable storage be in the same location as the server it's being attached to -- our SoftLayer docker host is in Dallas 13. I propose we remove the unattached portable storage and then resize test-softlayer-ubuntu1804-docker-x64-1 with an extra 200GB portable storage.

(n.b. I have a vague recollection that the unattached "jenkins-release-new" is from when we had to rebuild the release CI and had issues attaching the storage and had to get IBM support involved. The release CI server is currently using the shown attached "jenkins-release" portable storage.)

richardlau avatar Mar 31 '22 15:03 richardlau