boinc icon indicating copy to clipboard operation
boinc copied to clipboard

[Explained] 7.16.3 systemd startup file prevents LHC VirtualBox jobs running

Open original-birdman opened this issue 4 years ago • 33 comments

Describe the bug I tried an update to 7.16.3, but it produced an oddity, so I switched back to 7.14.2 But I left the 7.16.3 systemd start-up file in place.

The result was that an LHC CMS task started, but then stalled. It ran for ~6mins (during which the VirtualBox process wasn't consuming CPU) then this showed up in the log:

Oct 28 20:03:53 benuc boinc[16837]: 28-Oct-2019 20:03:53 [LHC@home] Task CMS_1316885_1572237828.669283_0 postponed for 86400 seconds: Communication with VM Hypervisor failed.

Re-installing the 7.14.2 start-up file and re-starting BOINC (no other change) has allowed the job to run.

Steps To Reproduce

  1. Try running an LHC CMS job using the 7.16.3 start-up file.

Expected behavior I'd expect jobs to be able to run successfully.

System Information

  • OS: Kubuntu 19.10
  • BOINC Version: 7.16.3 (start-up file only)

Additional context I'll post the working and non-working systemd start-up files in the follow-ups.

original-birdman avatar Oct 28 '19 21:10 original-birdman

This is the (minorly-modified to reflect my path usage) 7.16.3 systemd config file. This FAILS.

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
Type=simple
ProtectHome=true
PrivateTmp=true
ProtectSystem=strict
ProtectControlGroups=true
#GML ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
ReadWritePaths=-/extra/BOINC/wlib
Nice=10
User=boinc
#GML WorkingDirectory=/var/lib/boinc
#GML ExecStart=/extra/BOINC/bin/boinc
WorkingDirectory=/extra/BOINC/wlib
ExecStart=/extra/BOINC/bin/boinc --dir /extra/BOINC/wlib --start_delay 20
ExecStop=/extra/BOINC/bin/boinccmd --quit
ExecReload=/extra/BOINC/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true

[Install]
WantedBy=multi-user.target

original-birdman avatar Oct 28 '19 21:10 original-birdman

This is the (similarly slightly-modified) 7.14.2 systemd config file. This WORKS.

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
ProtectHome=true
Type=simple
Nice=10
User=boinc
WorkingDirectory=/extra/BOINC/wlib
ExecStart=/extra/BOINC/bin/boinc --dir /extra/BOINC/wlib --start_delay 20
ExecStop=/extra/BOINC/bin/boinccmd --quit
ExecReload=/extra/BOINC/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle

[Install]
WantedBy=multi-user.target

original-birdman avatar Oct 28 '19 21:10 original-birdman

Here is a context diff between the two:

--- BOINC/GMLbits/boinc-client.service  2019-06-19 10:20:21.000000000 +0100
+++ BOINC.new/GMLbits/boinc-client.service      2019-10-26 12:18:26.524522407 +0100
@@ -4,16 +4,37 @@
 After=network-online.target
 
 [Service]
-ProtectHome=true
 Type=simple
+ProtectHome=true
+PrivateTmp=true
+ProtectSystem=strict
+ProtectControlGroups=true
+#GML ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
+ReadWritePaths=-/extra/BOINC/wlib
 Nice=10
 User=boinc
+#GML WorkingDirectory=/var/lib/boinc
+#GML ExecStart=/extra/BOINC/bin/boinc
 WorkingDirectory=/extra/BOINC/wlib
 ExecStart=/extra/BOINC/bin/boinc --dir /extra/BOINC/wlib --start_delay 20
 ExecStop=/extra/BOINC/bin/boinccmd --quit
 ExecReload=/extra/BOINC/bin/boinccmd --read_cc_config
 ExecStopPost=/bin/rm -f lockfile
 IOSchedulingClass=idle
+# The following options prevent setuid root as they imply NoNewPrivileges=true
+# Since Atlas requires setuid root, they break Atlas
+# In order to improve security, if you're not using Atlas,
+# Add these options to the [Service] section of an override file using
+# sudo systemctl edit boinc-client.service
+#NoNewPrivileges=true
+#ProtectKernelModules=true
+#ProtectKernelTunables=true
+#RestrictRealtime=true
+#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
+#RestrictNamespaces=true
+#PrivateUsers=true
+#CapabilityBoundingSet=
+#MemoryDenyWriteExecute=true
 
 [Install]
 WantedBy=multi-user.target

original-birdman avatar Oct 28 '19 21:10 original-birdman

You modified the boinc client systemd unit file. What about using regular systemd unit file provided by boinc?

Germano0 avatar Oct 29 '19 14:10 Germano0

I have to modify it as I build it (BOINC) myself and put it into a non-system location (so it, and the downloaded job data, is preserved on a fresh install of the OS). The only changes are related to that (and adding a start-up delay).

original-birdman avatar Oct 29 '19 14:10 original-birdman

This makes much harder for any of us to try reproduce the problem. Try to reproduce it using upstream boinc client systemd unit file

Germano0 avatar Oct 29 '19 15:10 Germano0

Which has the wrong paths completely, so won't start anything.

original-birdman avatar Oct 29 '19 15:10 original-birdman

This is the diff of the changes I made wrt the standard systemd file:

[benuc]: diff -u boinc-client.service /extra/BOINC.new/GMLbits/
--- boinc-client.service        2019-10-26 11:39:05.000000000 +0100
+++ /extra/BOINC.new/GMLbits/boinc-client.service       2019-10-26 12:18:26.524522407 +0100
@@ -9,11 +9,14 @@
 PrivateTmp=true
 ProtectSystem=strict
 ProtectControlGroups=true
-ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
+#GML ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
+ReadWritePaths=-/extra/BOINC/wlib
 Nice=10
 User=boinc
-WorkingDirectory=/var/lib/boinc
-ExecStart=/extra/BOINC/bin/boinc
+#GML WorkingDirectory=/var/lib/boinc
+#GML ExecStart=/extra/BOINC/bin/boinc
+WorkingDirectory=/extra/BOINC/wlib
+ExecStart=/extra/BOINC/bin/boinc --dir /extra/BOINC/wlib --start_delay 20
 ExecStop=/extra/BOINC/bin/boinccmd --quit
 ExecReload=/extra/BOINC/bin/boinccmd --read_cc_config
 ExecStopPost=/bin/rm -f lockfile

Only changing to reflect the different paths I use.

original-birdman avatar Oct 29 '19 15:10 original-birdman

And, of course, I make the same path changes on the older systemd files; which let VirtualBox jobs run.

So the issue would seem to be related to the other (BOINC-made) changes, namely:

PrivateTmp=true
ProtectSystem=strict
ProtectControlGroups=true

or, possible, the ReadWritePaths= setting. But I don't know what these do, nor how they would affect VirtualBox jobs.

original-birdman avatar Oct 29 '19 15:10 original-birdman

It occurred to me to look at the log of the completed job. The relevant bit is:

<stderr_txt>
2019-10-28 13:31:38 (10480): vboxwrapper (7.7.26196): starting
2019-10-28 13:34:45 (10480): Error in guest additions for VM: -2142830585
Command:
VBoxManage -q list systemproperties
Output:
VBoxManage: error: Failed to create the VirtualBox object!
VBoxManage: error: Code NS_BASE_STREAM_WOULD_BLOCK (0x80470007) - Stream operation would block (extended info not available)
VBoxManage: error: Most likely, the VirtualBox COM server is not running or failed to start.

2019-10-28 13:34:45 (10480): Feature: Checkpoint interval offset (538 seconds)
2019-10-28 13:34:45 (10480): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-28 13:37:51 (10480): Error in host info for VM: -2142830585
Command:
VBoxManage -q list hostinfo 
Output:
VBoxManage: error: Failed to create the VirtualBox object!
VBoxManage: error: Code NS_BASE_STREAM_WOULD_BLOCK (0x80470007) - Stream operation would block (extended info not available)
VBoxManage: error: Most likely, the VirtualBox COM server is not running or failed to start.

2019-10-28 13:37:51 (10480): WARNING: Communication with VM Hypervisor failed.
2019-10-28 13:37:51 (10480): Could not communicate with VM Hypervisor. Rescheduling execution for a later date.
2019-10-28 19:57:39 (16912): vboxwrapper (7.7.26196): starting
2019-10-28 20:00:45 (16912): Error in guest additions for VM: -2142830585
Command:
VBoxManage -q list systemproperties
Output:
VBoxManage: error: Failed to create the VirtualBox object!
VBoxManage: error: Code NS_BASE_STREAM_WOULD_BLOCK (0x80470007) - Stream operation would block (extended info not available)
VBoxManage: error: Most likely, the VirtualBox COM server is not running or failed to start.

2019-10-28 20:00:45 (16912): Feature: Checkpoint interval offset (547 seconds)
2019-10-28 20:00:45 (16912): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-28 20:03:52 (16912): Error in host info for VM: -2142830585
Command:
VBoxManage -q list hostinfo 
Output:
VBoxManage: error: Failed to create the VirtualBox object!
VBoxManage: error: Code NS_BASE_STREAM_WOULD_BLOCK (0x80470007) - Stream operation would block (extended info not available)
VBoxManage: error: Most likely, the VirtualBox COM server is not running or failed to start.

2019-10-28 20:03:52 (16912): WARNING: Communication with VM Hypervisor failed.
2019-10-28 20:03:52 (16912): Could not communicate with VM Hypervisor. Rescheduling execution for a later date.
2019-10-28 20:34:45 (18312): vboxwrapper (7.7.26196): starting
2019-10-28 20:34:46 (18312): Feature: Checkpoint interval offset (241 seconds)
2019-10-28 20:34:46 (18312): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-28 20:34:46 (18312): Detected: Minimum checkpoint interval (600.000000 seconds)
....

The only change that occurred got it going was to revert to the older systemd start-up file. There were no changes to the VirtualBox set-up.

Whilst these errors were happening a VirtualBox process was started, but it stopped after a minute(?) or so.

original-birdman avatar Oct 30 '19 11:10 original-birdman

FYI: I've suspended a CMS job on the system until I have time to start BOINC up with various options from the 3 additions missing, to discover which is/are causing the problem.

original-birdman avatar Oct 30 '19 11:10 original-birdman

FYI: I've suspended a CMS job on the system until I have time to start BOINC up with various options from the 3 additions missing, to discover which is/are causing the problem.

I switched BOINC to 7.16.3 (although the version doesn't actually matter) and started BOINC with various of the 3 options enabled.

The stderr.txt for the job (which is accumulative) is pasted at the end.

18:43:04 start is with all 3 additions commented out. Starts OK. 18:47:42 start is with PrivateTemp enabled. Starts OK. 18:49:35 start is with ProtectSystem also enabled. This fails. vboxwrapper starts, but after 3 mins it fails with "Error in guest additions for VM". 18:55:08 is with ProtectSystem disabled, but PrivateTemp and ProtectControlGroups anabled. Starts OK.

So the problem is with ProtectSystem=strict being set.

Here's the stderr.txt file covering these starts.

2019-10-30 18:43:04 (19070): vboxwrapper (7.7.26196): starting
2019-10-30 18:43:04 (19070): Feature: Checkpoint interval offset (251 seconds)
2019-10-30 18:43:04 (19070): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-30 18:43:04 (19070): Detected: Minimum checkpoint interval (600.000000 seconds)
2019-10-30 18:43:04 (19070): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2019-10-30 18:43:04 (19070): Successfully copied 'init_data.xml' to the shared directory.
2019-10-30 18:43:04 (19070): Create VM. (boinc_deaaf2f10e33db57, slot#5)
2019-10-30 18:43:05 (19070): Setting Memory Size for VM. (2048MB)
2019-10-30 18:43:05 (19070): Setting CPU Count for VM. (1)
2019-10-30 18:43:05 (19070): Setting Chipset Options for VM.
2019-10-30 18:43:05 (19070): Setting Boot Options for VM.
2019-10-30 18:43:05 (19070): Setting Network Configuration for NAT.
2019-10-30 18:43:05 (19070): Enabling VM Network Access.
2019-10-30 18:43:05 (19070): Disabling USB Support for VM.
2019-10-30 18:43:05 (19070): Disabling COM Port Support for VM.
2019-10-30 18:43:05 (19070): Disabling LPT Port Support for VM.
2019-10-30 18:43:05 (19070): Disabling Audio Support for VM.
2019-10-30 18:43:05 (19070): Disabling Clipboard Support for VM.
2019-10-30 18:43:06 (19070): Disabling Drag and Drop Support for VM.
2019-10-30 18:43:06 (19070): Adding storage controller(s) to VM.
2019-10-30 18:43:06 (19070): Adding virtual disk drive to VM. (vm_image.vdi)
2019-10-30 18:43:06 (19070): Adding VirtualBox Guest Additions to VM.
2019-10-30 18:43:06 (19070): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2019-10-30 18:43:06 (19070): forwarding host port 45653 to guest port 80
2019-10-30 18:43:06 (19070): Enabling remote desktop for VM.
2019-10-30 18:43:06 (19070): Enabling shared directory for VM.
2019-10-30 18:43:07 (19070): Starting VM. (boinc_deaaf2f10e33db57, slot#5)
2019-10-30 18:43:08 (19070): Successfully started VM. (PID = '19555')
2019-10-30 18:43:08 (19070): Reporting VM Process ID to BOINC.
2019-10-30 18:43:08 (19070): Guest Log: BIOS: VirtualBox 6.0.14
2019-10-30 18:43:08 (19070): Guest Log: CPUID EDX: 0x178bfbff
2019-10-30 18:43:08 (19070): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2019-10-30 18:43:08 (19070): VM state change detected. (old = 'poweroff', new = 'running')
2019-10-30 18:43:08 (19070): Detected: Web Application Enabled (http://localhost:45653)
2019-10-30 18:43:08 (19070): Detected: Remote Desktop Enabled (localhost:59707)
2019-10-30 18:43:08 (19070): Preference change detected
2019-10-30 18:43:08 (19070): Setting CPU throttle for VM. (70%)
2019-10-30 18:43:08 (19070): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))
2019-10-30 18:43:10 (19070): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2019-10-30 18:43:10 (19070): Guest Log: BIOS: Booting from Hard Disk...
2019-10-30 18:43:13 (19070): Guest Log: BIOS: KBD: unsupported int 16h function 03
2019-10-30 18:43:13 (19070): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
2019-10-30 18:43:31 (19070): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2019-10-30 18:43:31 (19070): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2019-10-30 18:43:56 (19070): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000123 main     Log opened 2019-10-30T18:43:55.132615000Z
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000345 main     OS Product: Linux
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000376 main     OS Release: 4.14.44-11.cernvm.x86_64
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000391 main     OS Version: #1 SMP Mon May 28 16:49:00 CEST 2018
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000404 main     Executable: /usr/share/vboxguest52/usr/sbin/VBoxService
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000405 main     Process ID: 2948
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000406 main     Package type: LINUX_64BITS_GENERIC
2019-10-30 18:43:56 (19070): Guest Log: 00:00:00.000921 main     5.2.6 r120293 started. Verbose level = 0
2019-10-30 18:44:15 (19070): Guest Log: [INFO] Mounting the shared directory
2019-10-30 18:44:15 (19070): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded!
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] 0
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] Testing VCCS connection to vccs.cern.ch on port 443
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] Connection to vccs.cern.ch 443 port [tcp/https] succeeded!
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] 0
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] Testing connection to Condor server on port 9618
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] Connection to vccondor01.cern.ch 9618 port [tcp/condor] succeeded!
2019-10-30 18:44:15 (19070): Guest Log: [DEBUG] 0
2019-10-30 18:45:19 (19070): Guest Log: [DEBUG] Probing CVMFS ...
2019-10-30 18:45:20 (19070): Guest Log: Probing /cvmfs/grid.cern.ch... OK
2019-10-30 18:45:20 (19070): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2019-10-30 18:45:20 (19070): Guest Log: 2.4.4.0 3691 1 25840 10619 3 1 1459451 10240001 2 65024 0 3 100 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1
2019-10-30 18:45:27 (19070): Guest Log: [INFO] Reading volunteer information
2019-10-30 18:45:27 (19070): Guest Log: [INFO] Volunteer: Gordon Lack (578185)
2019-10-30 18:45:28 (19070): Guest Log: [INFO] VMID: c5ab4400-4ef6-44a9-aaed-bee78432355d
2019-10-30 18:45:28 (19070): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2019-10-30 18:45:28 (19070): Guest Log: [INFO] Running the fast benchmark.
2019-10-30 18:46:03 (19070): Guest Log: [INFO] Machine performance 14.04 HEPSPEC06
2019-10-30 18:46:03 (19070): Guest Log: [INFO] CMS application starting. Check log files.
2019-10-30 18:46:04 (19070): Guest Log: [DEBUG] HTCondor ping
2019-10-30 18:46:05 (19070): Guest Log: [DEBUG] 0
2019-10-30 18:47:42 (22831): vboxwrapper (7.7.26196): starting
2019-10-30 18:47:42 (22831): Feature: Checkpoint interval offset (2 seconds)
2019-10-30 18:47:42 (22831): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-30 18:47:43 (22831): Detected: Minimum checkpoint interval (600.000000 seconds)
2019-10-30 18:47:43 (22831): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2019-10-30 18:47:43 (22831): Guest Log: BIOS: VirtualBox 6.0.14
2019-10-30 18:47:43 (22831): Guest Log: CPUID EDX: 0x178bfbff
2019-10-30 18:47:43 (22831): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2019-10-30 18:47:43 (22831): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2019-10-30 18:47:43 (22831): Guest Log: BIOS: Booting from Hard Disk...
2019-10-30 18:47:43 (22831): Guest Log: BIOS: KBD: unsupported int 16h function 03
2019-10-30 18:47:43 (22831): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
2019-10-30 18:47:43 (22831): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2019-10-30 18:47:43 (22831): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2019-10-30 18:47:43 (22831): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000123 main     Log opened 2019-10-30T18:43:55.132615000Z
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000345 main     OS Product: Linux
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000376 main     OS Release: 4.14.44-11.cernvm.x86_64
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000391 main     OS Version: #1 SMP Mon May 28 16:49:00 CEST 2018
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000404 main     Executable: /usr/share/vboxguest52/usr/sbin/VBoxService
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000405 main     Process ID: 2948
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000406 main     Package type: LINUX_64BITS_GENERIC
2019-10-30 18:47:43 (22831): Guest Log: 00:00:00.000921 main     5.2.6 r120293 started. Verbose level = 0
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Mounting the shared directory
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded!
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] 0
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Testing VCCS connection to vccs.cern.ch on port 443
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Connection to vccs.cern.ch 443 port [tcp/https] succeeded!
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] 0
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Testing connection to Condor server on port 9618
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Connection to vccondor01.cern.ch 9618 port [tcp/condor] succeeded!
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] 0
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] Probing CVMFS ...
2019-10-30 18:47:43 (22831): Guest Log: Probing /cvmfs/grid.cern.ch... OK
2019-10-30 18:47:43 (22831): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2019-10-30 18:47:43 (22831): Guest Log: 2.4.4.0 3691 1 25840 10619 3 1 1459451 10240001 2 65024 0 3 100 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Reading volunteer information
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Volunteer: Gordon Lack (578185)
2019-10-30 18:47:43 (22831): Guest Log: [INFO] VMID: c5ab4400-4ef6-44a9-aaed-bee78432355d
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Running the fast benchmark.
2019-10-30 18:47:43 (22831): Guest Log: [INFO] Machine performance 14.04 HEPSPEC06
2019-10-30 18:47:43 (22831): Guest Log: [INFO] CMS application starting. Check log files.
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] HTCondor ping
2019-10-30 18:47:43 (22831): Guest Log: [DEBUG] 0
2019-10-30 18:47:43 (22831): Starting VM. (boinc_deaaf2f10e33db57, slot#5)
2019-10-30 18:47:44 (22831): Successfully started VM. (PID = '22958')
2019-10-30 18:47:44 (22831): Reporting VM Process ID to BOINC.
2019-10-30 18:47:44 (22831): VM state change detected. (old = 'poweroff', new = 'running')
2019-10-30 18:47:44 (22831): Detected: Web Application Enabled (http://localhost:45653)
2019-10-30 18:47:44 (22831): Detected: Remote Desktop Enabled (localhost:59707)
2019-10-30 18:47:44 (22831): Preference change detected
2019-10-30 18:47:44 (22831): Setting CPU throttle for VM. (70%)
2019-10-30 18:47:44 (22831): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))
2019-10-30 18:49:35 (24352): vboxwrapper (7.7.26196): starting
2019-10-30 18:52:41 (24352): Error in guest additions for VM: -2142830585
Command:
VBoxManage -q list systemproperties
Output:
VBoxManage: error: Failed to create the VirtualBox object!
VBoxManage: error: Code NS_BASE_STREAM_WOULD_BLOCK (0x80470007) - Stream operation would block (extended info not available)
VBoxManage: error: Most likely, the VirtualBox COM server is not running or failed to start.

2019-10-30 18:52:41 (24352): Feature: Checkpoint interval offset (114 seconds)
2019-10-30 18:52:41 (24352): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-30 18:55:08 (24736): vboxwrapper (7.7.26196): starting
2019-10-30 18:55:08 (24736): Feature: Checkpoint interval offset (355 seconds)
2019-10-30 18:55:08 (24736): Detected: VirtualBox VboxManage Interface (Version: 6.0.14)
2019-10-30 18:55:08 (24736): Detected: Minimum checkpoint interval (600.000000 seconds)
2019-10-30 18:55:08 (24736): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2019-10-30 18:55:08 (24736): Guest Log: BIOS: VirtualBox 6.0.14
2019-10-30 18:55:08 (24736): Guest Log: CPUID EDX: 0x178bfbff
2019-10-30 18:55:08 (24736): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2019-10-30 18:55:08 (24736): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2019-10-30 18:55:08 (24736): Guest Log: BIOS: Booting from Hard Disk...
2019-10-30 18:55:08 (24736): Guest Log: BIOS: KBD: unsupported int 16h function 03
2019-10-30 18:55:08 (24736): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
2019-10-30 18:55:08 (24736): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2019-10-30 18:55:08 (24736): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2019-10-30 18:55:08 (24736): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000125 main     Log opened 2019-10-30T18:48:26.132457000Z
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000258 main     OS Product: Linux
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000286 main     OS Release: 4.14.44-11.cernvm.x86_64
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000301 main     OS Version: #1 SMP Mon May 28 16:49:00 CEST 2018
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000337 main     Executable: /usr/share/vboxguest52/usr/sbin/VBoxService
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000338 main     Process ID: 2902
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.000338 main     Package type: LINUX_64BITS_GENERIC
2019-10-30 18:55:08 (24736): Guest Log: 00:00:00.001296 main     5.2.6 r120293 started. Verbose level = 0
2019-10-30 18:55:08 (24736): Guest Log: [INFO] Mounting the shared directory
2019-10-30 18:55:08 (24736): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded!
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] 0
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] Testing VCCS connection to vccs.cern.ch on port 443
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] Connection to vccs.cern.ch 443 port [tcp/https] succeeded!
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] 0
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] Testing connection to Condor server on port 9618
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] Connection to vccondor01.cern.ch 9618 port [tcp/condor] succeeded!
2019-10-30 18:55:08 (24736): Guest Log: [DEBUG] 0
2019-10-30 18:55:08 (24736): Starting VM. (boinc_deaaf2f10e33db57, slot#5)
2019-10-30 18:55:09 (24736): Successfully started VM. (PID = '24863')
2019-10-30 18:55:09 (24736): Reporting VM Process ID to BOINC.
2019-10-30 18:55:10 (24736): VM state change detected. (old = 'poweroff', new = 'running')
2019-10-30 18:55:10 (24736): Detected: Web Application Enabled (http://localhost:45653)
2019-10-30 18:55:10 (24736): Detected: Remote Desktop Enabled (localhost:59707)
2019-10-30 18:55:10 (24736): Preference change detected
2019-10-30 18:55:10 (24736): Setting CPU throttle for VM. (70%)
2019-10-30 18:55:10 (24736): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 600 seconds))

original-birdman avatar Oct 30 '19 19:10 original-birdman

Can confirm, ProtectSystem=strict breaks ATLAS (LHC@home virtualbox) tasks.

strict prevents writing to anything not in boinc's home directory and not in ReadWritePaths= entries, except /dev, /sys, and /proc (as those are "to be interacted with" API special directories)

Reducing it to ProtectSystem=full fixes it.

true means hard prevent writing of /usr, /boot. full means adding /etc to that list too. Note that even though /etc is blocked from writing, the /etc/boinc-client in ReadWritePaths= pokes a hole to let is write to /etc/boinc-client even if the rest of /etc is locked.

However, reducing it to full is a major reduction in security. (Note that home directories besides the boinc user's is still denied by ProtectHome=true) Probably a better approach is figuring out what paths VBoxManage tries to bind to, and add those to ReadWritePaths=

techsy730 avatar Apr 14 '21 00:04 techsy730

Probably a better approach is figuring out what paths VBoxManage tries to bind to, and add those to BindPaths= instead.

It occurs to me that the VirtualBox jobs (try to) write to (in my case):

 /extra/BOINC/lib/.config/VirtualBox

which is (at least for me) not under the boinc account's HOME (which I have as /extra/BOINC/wlib).

Why doesn't it just put these under wlib (which is a location for "dynamic" files) rather than lib (which is surely for "fixed" ones)?

original-birdman avatar Apr 14 '21 00:04 original-birdman

I noticed that VBoxXPCOMIPCD tries to open stuff in the directory /tmp/.vbox-boinc-ipc. So maybe that needs to be added to ReadWritePaths=?

Also adding Wants=virtualbox.service may be a good idea. That adds a non-binding want relationship (and does not specify an ordering relationship either). So if virtualbox.service doesn't exist or doesn't start up properly, it won't block boinc-client.service from starting. If it does exist and isn't already running, it starts it at the same time as boinc-client (Before= is what establishes an ordering on a unit finishing, but we don't want that here).


Oh and heads up @original-birdman , you can totally modify the original service file without creating a new .service file by or editing the original. Make new .conf files under /etc/systemd/system/boinc-client.service.d/ and override fields you want to change (section names are still needed, so you still need [Service] or [Unit] or whatever). Only fields listed will be overridden, the other will be taken from the original .service file. (See man systemd.unit and search for "drop-in") If you are installing boinc-client outside of apt, then this doesn't apply to you, as you don't have a system boinc-client.service to modify. (Also works for any other service or unit)

techsy730 avatar Apr 14 '21 00:04 techsy730

Thanks for the override info - I do use it for other services on my systems (I have one for getty@). But given that I'm installing the application (and hence start-up files) from my own build anyway I may as well just install my edited file.

original-birdman avatar Apr 14 '21 01:04 original-birdman

@original-birdman How did you get the stderr of a task in boinc? I can't seem to find it in the project's folder

techsy730 avatar Apr 14 '21 13:04 techsy730

stderr:

  1. While running: it's in the slot directory for the task. If using BOINC Manager, highlight the task and click 'properties' to see the slot number.
  2. After completion: the file is embedded in client_state.xml, pending report.
  3. After reporting: view the task on the project's web site.

RichardHaselgrove avatar Apr 14 '21 14:04 RichardHaselgrove

Here's one (for a non-LHC, non-VBox job, but the principle is the same)

[parent]: pwd
/extra/BOINC/wlib/slots/2
[parent]: ls -l stderr.txt 
-rw-r--r-- 1 boinc boinc 5307 Apr 14 13:42 stderr.txt

original-birdman avatar Apr 14 '21 15:04 original-birdman

I can confirm that adding the following override solved this problem for me (on Arch Linux, using standard service file):

[Service]
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client -/tmp/.vbox-boinc-ipc

The vbox commands simply need to write the temporary IPC folder.

octylFractal avatar Apr 21 '21 06:04 octylFractal

Just tried adding that override, didn't work for me. Don't see .vbox-boinc-ipc in /tmp/ either. Running Manjaro Linux, BOINC 7.16.16 & VirtualBox 6.1.22. I'll try some deeper digging when I get home later today.

dobb13 avatar May 07 '21 12:05 dobb13

After testing ProtectSystem=full, /tmp/.vbox-boinc-ipc/ appears. Do we have a better way to properly create this folder, and protect the rest of the filesystem?

dobb13 avatar May 08 '21 00:05 dobb13

I found two possible solutions. Both of these work with ProtectSystem=strict. First the easy way: allow writing to /tmp so VirtualBox can create the directory it uses.

Edit boinc-client.service:

ReadWritePaths=-/var/lib/boinc -/etc/boinc-client -/tmp

Done. Or if we didn't want to allow write access to all of /tmp, we could get systemd to make /tmp/.vbox-boinc-ipc first.

Edit boinc-client.service:

ReadWritePaths=-/var/lib/boinc -/etc/boinc-client -/tmp/.vbox-boinc-ipc

Add /usr/lib/tmpfiles.d/boinc-vbox.conf:

# Make temporary directory for VirtualBox socket used by BOINC client
# so we don't have to allow write-access to all of /tmp in boinc-client.service.
d /tmp/.vbox-boinc-ipc 700 boinc boinc - -

The directory is created by the systemd-tmpfiles-setup service at boot time. I don't know enough about systemd to say if option 2 is reasonable. Option 1 might be good enough. Any systemd experts here? References: https://www.virtualbox.org/manual/ch02.html#startingvboxonlinux https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html

pulsa avatar Feb 20 '22 22:02 pulsa

The settings in the systemd service config should be based on the options passed into configure.

ReadWritePaths doesn't seem to be,

original-birdman avatar Feb 20 '22 22:02 original-birdman

@original-birdman, I mount a separate volume at /var/lib/boinc-client to avoid changing any paths in the service file. I think a bind mount would do the same thing. Would that work for you? I agree, it would be nice if the service file were more dynamic... but that might be more work than fixing VirtualBox, which I took to be the focus of this issue.

pulsa avatar Feb 20 '22 23:02 pulsa

@original-birdman, I mount a separate volume at /var/lib/boinc-client to avoid changing any paths in the service file. I think a bind mount would do the same thing.

It's a good idea....thanks.

original-birdman avatar Feb 21 '22 00:02 original-birdman

Here are my current modifications in boinc-client.service.d/override.conf (well, the ones relevant to this bug; I have a lot more :stuck_out_tongue:):

[Service]
# For VBox communication
ReadWritePaths=-/tmp/.vbox-boinc-ipc
# For X11 idle detection (I think only .X11-unix is needed for this, but I figured "Why not the others too?"
BindReadOnlyPaths=-/tmp/.X11-unix/ -/tmp/.ICE-unix -/tmp/.fonts-unix

The /tmp/.vbox-boinc-ipc might work with BindPath=, but I didn't want to risk the VBoxService getting confused with the socket and all.


I also have a systemd-tmpfiles configuration for ensuring that /tmp/.vbox-boinc-ipc exists, because boinc-client I am still not giving permissions to write in the "real" /tmp folder. Without this, if it didn't already exist, VBoxService running as BOINC wouldn't have permission to create it.

# In vbox-ipc-tmpdir.conf

# Create a directory /tmp/.vbox-boinc-ipc if it doesn't already exist, and give it to boinc:boinc. No expiration times.
d /tmp/.vbox-boinc-ipc 700 boinc boinc -

and

# In the above mentioned boinc-client.service.d/overrides.conf
[Service]
ExecStartPre=-+/bin/systemd-tmpfiles --create /usr/local/lib/boinc-client/vbox-ipc-tmpdir.conf

(notice the +, which tells systemd it is fine to execute this particular part with full permissions, not sandboxed, so it can make dirs in /tmp like normal)

However, an easier, if moderately less secure, solution would probably just be disabling /tmp sandboxing

[Service]
# DISABLE the /tmp sandboxing for boinc
PrivateTmp=false

This is probably still not a great idea for security though.

techsy730 avatar Feb 21 '22 00:02 techsy730

Also, at this point, is this even a BOINC bug?

If the .service is derived from BOINC's codebase, then yeah it is.

But if the offending .service file was done by the Debian team (or Ubuntu team) then this discussion needs to be migrated to their bug tracking system.

techsy730 avatar Feb 21 '22 00:02 techsy730

The systemd file produced by the BOINC build contains the settings that cause the problem. So yes, it's a BOINC bug (IMHO).

original-birdman avatar Feb 21 '22 02:02 original-birdman

Hello, I have a similar problem on my side. I solved the problem by uncommenting this line : PrivateUsers=true

I talk about it on the forum of LHC@home : https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5818

psampont avatar Mar 28 '22 08:03 psampont