Cloud Foundry bosh deploy with Noble stemcell is failing sporadicly on BBL environment
ARD working group is working on initiative to integrate Noble stemcell as default one in our cf-deployment release pipeline. We started with automatic pipeline running on GCP infrastructure and since more than a week have pipeline running on AWS infra.
The deployment in general is successful but we observe sporadic failing deployments.
The bosh task is failing executing bosh -n -d cf deploy with the following error message:
Task 103250 | 11:44:08 | Compiling packages: percona-xtradb-cluster-8.4/63c42001061a3c6a53c528c8647885ad18721b40eb7944eeea924cabe9332a7d
Task 103250 | 11:48:14 | Compiling packages: libtool/63f9e595c3ad8340ffc57add4e4ed5a8bea15b762582be1a058cf59f303a88b5 (00:10:58)
L Error: Timed out pinging VM 'i-04394880118084ed0' with agent '19823291-2eb8-470e-815a-fd9ec319966a' after 600 seconds
Task 103250 | 11:48:19 | Compiling packages: tini/b1e8f2328be2ec004105c997fe0b6a1020b893fd9c6f238eb661326642be4e33 (00:11:03)
L Error: Timed out pinging VM 'i-08025063798fb2021' with agent 'ba999dc9-f452-41c7-ba74-59cc06cacb17' after 600 seconds
Task 103250 | 11:55:35 | Compiling packages: percona-xtrabackup-2.4/3fa209bf4f667d0719adf4c91969e7c2f8e8013d217986b7e6cfd00838f737b6 (00:11:43)
Task 103250 | 12:12:38 | Compiling packages: percona-xtrabackup-8.0/8cf94d80809b3bf19afaf1be9385f4292c9b5cc4578f1bbb897493d1e24a77f1 (00:28:57)
Task 103250 | 12:23:15 | Compiling packages: percona-xtrabackup-8.4/14fca05a7b64fe5f33d0debb56ca35e919fa75d3b372c32b584efc14c7b374cd (00:39:37)
Task 103250 | 13:07:04 | Compiling packages: percona-xtradb-cluster-8.4/63c42001061a3c6a53c528c8647885ad18721b40eb7944eeea924cabe9332a7d (01:22:56)
Task 103250 | 13:07:57 | Error: Timed out pinging VM 'i-04394880118084ed0' with agent '19823291-2eb8-470e-815a-fd9ec319966a' after 600 seconds
After retriggering the Concourse deoly job the deployment in most of the cases finishes with success. Could you advice us what could be the root cause of such an issue. This is not observed on GCP infra.
Deployment can also fail during VMs creation.
Task 103307 | 13:23:33 | Creating missing vms: diego-cell/55943a69-942c-4000-9dd4-8c4d90dc2259 (0) (00:13:23)
L Error: Timed out pinging VM 'i-01cf8571465adb6ff' with agent '82392b52-cdfe-4fd6-979b-ae146f24f246' after 600 seconds
Task 103307 | 13:23:34 | Creating missing vms: uaa/812aa232-6a0d-4372-852c-2e0ccc72dcf2 (1) (00:13:24)
L Error: Timed out pinging VM 'i-084071c23ba59e57c' with agent '4ee4ce75-79c0-461e-aa35-1c3fc1da71f8' after 600 seconds
Task 103307 | 13:23:56 | Creating missing vms: doppler/465f8a61-9f11-4865-a0d2-e8b70098e46e (0) (00:13:46)
L Error: Timed out pinging VM 'i-0cfe1277c25e5e6fb' with agent 'a26e29de-90e9-4c7e-bc6f-6bfcfa756661' after 600 seconds
Task 103307 | 13:23:56 | Error: Timed out pinging VM 'i-01cf8571465adb6ff' with agent '82392b52-cdfe-4fd6-979b-ae146f24f246' after 600 seconds
Find attached bosh-agent.log + syslog from a failing VM.
Compilation VM type used on AWS is c5a.xlarge.
you mentioned that you don't face any issues on GCP with noble. Can you share the cloud-configs for both the environments?
Find attached the cloud-config for both BBL environments.
I see you use more virtual cpus for compiling on GCP (8) as compared to AWS (4). Also, for GCP you use Intel processors as compared to AMD on AWS. Could you try using c5.2xlarge as the vm type for compilation vms, making it similar to GCP?
Investigating together with @ramonskie we managed to find the agent is restarting:
2025-09-03T09:39:41.317269+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: ********************
2025-09-03T09:39:41.317307+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:41 DEBUG - Running command 'ip r'
2025-09-03T09:39:41.317735+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:41 DEBUG - Successful: true (0)
2025-09-03T09:39:45.846628+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:45 DEBUG - Running command 'arping -c 1 -U -I eth0 10.0.32.9'
2025-09-03T09:39:46.854628+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:46 DEBUG - Stdout: ARPING 10.0.32.9 from 10.0.32.9 eth0
2025-09-03T09:39:46.854715+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: Sent 1 probes (1 broadcast(s))
2025-09-03T09:39:46.854766+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: Received 0 response(s)
2025-09-03T09:39:46.854826+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:46 DEBUG - Stderr:
2025-09-03T09:39:46.854861+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:46 DEBUG - Successful: true (0)
2025-09-03T09:39:51.855078+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:51 DEBUG - Running command 'arping -c 1 -U -I eth0 10.0.32.9'
2025-09-03T09:39:52.866639+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:52 DEBUG - Stdout: ARPING 10.0.32.9 from 10.0.32.9 eth0
2025-09-03T09:39:52.866726+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: Sent 1 probes (1 broadcast(s))
2025-09-03T09:39:52.866835+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: Received 0 response(s)
2025-09-03T09:39:52.866888+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:52 DEBUG - Stderr:
2025-09-03T09:39:52.866926+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [Cmd Runner] 2025/09/03 09:39:52 DEBUG - Successful: true (0)
2025-09-03T09:39:53.356599+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[573]: [main] 2025/09/03 09:39:53 ERROR - Agent exited with error: received signal (terminated): stopping now
2025-09-03T09:43:08.235238+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[596]: [main] 2025/09/03 09:43:07 DEBUG - Starting agent
2025-09-03T09:43:08.235262+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[596]: [File System] 2025/09/03 09:43:07 DEBUG - Reading file /var/vcap/bosh/agent.json
2025-09-03T09:43:08.235269+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[596]: [File System] 2025/09/03 09:43:07 DEBUG - Read content
2025-09-03T09:43:08.235276+00:00 7cef1b80-dfd9-4ec2-bd24-8de90ce45118 bosh-agent[596]: ********************
One assumption is that it is caused by a network issue.
We have configured using c5.2xlarge as the vm type.
root@a6b9e08df9c7:/home/bbl/bbl-state# vim bosh-cc.yml
root@a6b9e08df9c7:/home/bbl/bbl-state# bosh ucc bosh-cc.yml
Using environment 'https://10.0.0.6:25555' as client 'admin'
vm_types:
- name: c5a.xlarge
cloud_properties:
- instance_type: c5a.xlarge
+ instance_type: c5.2xlarge
compilation:
- vm_type: c5.2xlarge
+ vm_type: c5a.xlarge
Continue? [yN]: y
The deployments is failing with the same error message after this change.
Task 275 | 12:55:56 | Preparing deployment: Preparing deployment (00:00:23)
Task 275 | 12:56:19 | Preparing deployment: Rendering templates (00:00:13)
Task 275 | 12:56:33 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 275 | 12:56:33 | Creating missing vms: diego-api/ff1f42b8-263e-4e36-919c-e821acb6ed31 (1) (00:14:31)
L Error: Timed out pinging VM 'i-09bcb701465efd2b2' with agent '44028e79-b923-4af0-a0d2-dba8e54646ea' after 600 seconds
Task 275 | 13:11:04 | Error: Timed out pinging VM 'i-09bcb701465efd2b2' with agent '44028e79-b923-4af0-a0d2-dba8e54646ea' after 600 seconds
We discussed in the meeting, that the agent is receiving a signal.
Assuming that systemd is the one sending the signal to the agent, can we proof that? One direction to check if we can configure systemd to give agent a higher priority, see: https://unix.stackexchange.com/questions/788554/set-highest-cpu-and-io-priority-for-a-systemd-service/788555#788555
We are taking over the investigation process from Dimitar to find the RC. As a first step, we modified the director to to keep the unreachable vms during compilation failure and also increase the agent wait timeout and apply those changes during the bbl-up:
- type: replace
path: /instance_groups/name=bosh/properties/director/debug?/keep_unreachable_vms
value: true
- type: replace
path: /instance_groups/name=bosh/properties/agent?/agent_wait_timeout
value: 1200
We were not able to find any kind of logs from the bosh agent nor any of the processes that should be running on this compilation vm. Therefore, we run this command manually:
exec nice -n -15 /var/vcap/bosh/bin/bosh-agent -P $(cat /var/vcap/bosh/etc/operating_system) -C /var/vcap/bosh/agent.json
But it could not start the agent. Here are the logs.
exec nice -n -15 /var/vcap/bosh/bin/bosh-agent -P $(cat /var/vcap/bosh/etc/operating_system) -C /var/vcap/bosh/agent.json
[main] 2025/10/01 12:24:18 DEBUG - Starting agent
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /var/vcap/bosh/agent.json
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
{
"Platform": {
"Linux": {
"PartitionerType": "parted",
"DevicePathResolutionType": "virtio",
"CreatePartitionIfNoEphemeralDisk": true,
"ServiceManager": "systemd",
"DiskIDTransformPattern": "^vol-(.+)$",
"DiskIDTransformReplacement": "nvme-Amazon_Elastic_Block_Store_vol${1}"
}
},
"Infrastructure": {
"Settings": {
"Sources": [
{
"Type": "HTTP",
"URI": "http://169.254.169.254",
"UserDataPath": "/latest/user-data",
"InstanceIDPath": "/latest/meta-data/instance-id",
"SSHKeysPath": "/latest/meta-data/public-keys/0/openssh-key",
"TokenPath": "/latest/api/token"
}
],
"UseRegistry": true
}
}
}
********************
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /var/vcap/bosh/etc/stemcell_version
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
1.90
********************
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /var/vcap/bosh/etc/stemcell_git_sha1
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
509d711e8e72b9cbf69d1b0123139821c28c1f1d+
********************
[App] 2025/10/01 12:24:18 INFO - Running on stemcell version '1.90' (git: 509d711e8e72b9cbf69d1b0123139821c28c1f1d+)
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /var/vcap/bosh/agent_state.json
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/var/vcap/bosh/agent_state.json'
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /var/vcap/bosh/agent_state.json
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
{"Linux":{"hosts_configured":true}}
********************
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Running command 'bosh-agent-rc'
[unlimitedRetryStrategy] 2025/10/01 12:24:18 DEBUG - Making attempt #0
[DelayedAuditLogger] 2025/10/01 12:24:18 DEBUG - Starting logging to syslog...
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Stdout:
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Successful: true (0)
[File System] 2025/10/01 12:24:18 DEBUG - Glob '/sys/class/net/*'
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /sys/class/net/eth0/device
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/sys/class/net/eth0/device'
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /sys/class/net/eth0/ifalias
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
********************
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /sys/class/net/eth0/address
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
02:91:68:4b:ab:69
********************
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /sys/class/net/lo/device
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/sys/class/net/lo/device'
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /sys/class/net/lo/ifalias
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
********************
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /etc/systemd/network/10_eth0.network
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/etc/systemd/network/10_eth0.network'
[httpMetadataService] 2025/10/01 12:24:18 DEBUG - Using IMDSv2 with endpoint: /latest/api/token
[httpClient] 2025/10/01 12:24:18 DEBUG - Sending PUT request to endpoint 'http://169.254.169.254/latest/api/token'
[attemptRetryStrategy] 2025/10/01 12:24:18 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
[clientRetryable] 2025/10/01 12:24:18 DEBUG - [requestID=3192210f-afe7-422e-7b8f-784a39d2e442] Requesting (attempt=1): Request{ Method: 'PUT', URL: 'http://169.254.169.254/latest/api/token' }
[httpClient] 2025/10/01 12:24:18 DEBUG - Sending GET request to endpoint 'http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key'
[attemptRetryStrategy] 2025/10/01 12:24:18 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
[clientRetryable] 2025/10/01 12:24:18 DEBUG - [requestID=8b2e450b-8d24-4c4e-4444-ed495eb7e751] Requesting (attempt=1): Request{ Method: 'GET', URL: 'http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key' }
[File System] 2025/10/01 12:24:18 DEBUG - Getting HomeDir for vcap
[File System] 2025/10/01 12:24:18 DEBUG - HomeDir is /home/vcap
[File System] 2025/10/01 12:24:18 DEBUG - Making dir /home/vcap/.ssh with perm 0700
[File System] 2025/10/01 12:24:18 DEBUG - Chown /home/vcap/.ssh to user vcap
[File System] 2025/10/01 12:24:18 DEBUG - Writing /home/vcap/.ssh/authorized_keys
[File System] 2025/10/01 12:24:18 DEBUG - Making dir /home/vcap/.ssh with perm 0777
[File System] 2025/10/01 12:24:18 DEBUG - Write content
********************
ssh-rsa [REDACTED] bbl-vega_bosh_vms
********************
[File System] 2025/10/01 12:24:18 DEBUG - Chown /home/vcap/.ssh/authorized_keys to user vcap
[File System] 2025/10/01 12:24:18 DEBUG - Chmod /home/vcap/.ssh/authorized_keys to 384
[settingsService] 2025/10/01 12:24:18 DEBUG - Loading settings from fetcher
[File System] 2025/10/01 12:24:18 DEBUG - Glob '/sys/class/net/*'
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /sys/class/net/eth0/device
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/sys/class/net/eth0/device'
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /sys/class/net/eth0/ifalias
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
********************
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /sys/class/net/eth0/address
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
02:91:68:4b:ab:69
********************
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /sys/class/net/lo/device
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/sys/class/net/lo/device'
[File System] 2025/10/01 12:24:18 DEBUG - Reading file /sys/class/net/lo/ifalias
[File System] 2025/10/01 12:24:18 DEBUG - Read content
********************
********************
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /etc/systemd/network/10_eth0.network
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/etc/systemd/network/10_eth0.network'
[httpMetadataService] 2025/10/01 12:24:18 DEBUG - Using IMDSv2 with endpoint: /latest/api/token
[httpClient] 2025/10/01 12:24:18 DEBUG - Sending PUT request to endpoint 'http://169.254.169.254/latest/api/token'
[attemptRetryStrategy] 2025/10/01 12:24:18 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
[clientRetryable] 2025/10/01 12:24:18 DEBUG - [requestID=0d0271ca-4bcc-4da8-5f00-19e72d80bb74] Requesting (attempt=1): Request{ Method: 'PUT', URL: 'http://169.254.169.254/latest/api/token' }
[httpClient] 2025/10/01 12:24:18 DEBUG - Sending GET request to endpoint 'http://169.254.169.254/latest/user-data'
[attemptRetryStrategy] 2025/10/01 12:24:18 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
[clientRetryable] 2025/10/01 12:24:18 DEBUG - [requestID=9c4fa4c0-ecd1-44cd-7390-e8c42423bca2] Requesting (attempt=1): Request{ Method: 'GET', URL: 'http://169.254.169.254/latest/user-data' }
[settingsService] 2025/10/01 12:24:18 DEBUG - Successfully received settings from fetcher
[File System] 2025/10/01 12:24:18 DEBUG - Checking if file exists /var/vcap/bosh/update_settings.json
[File System] 2025/10/01 12:24:18 DEBUG - Stat '/var/vcap/bosh/update_settings.json'
[File System] 2025/10/01 12:24:18 DEBUG - Making dir /var/vcap/bosh with perm 0777
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Running command 'ip r'
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Successful: true (0)
[File System] 2025/10/01 12:24:18 DEBUG - Getting HomeDir for vcap
[File System] 2025/10/01 12:24:18 DEBUG - HomeDir is /home/vcap
[File System] 2025/10/01 12:24:18 DEBUG - Making dir /home/vcap/.ssh with perm 0700
[File System] 2025/10/01 12:24:18 DEBUG - Chown /home/vcap/.ssh to user vcap
[File System] 2025/10/01 12:24:18 DEBUG - Writing /home/vcap/.ssh/authorized_keys
[File System] 2025/10/01 12:24:18 DEBUG - Making dir /home/vcap/.ssh with perm 0777
[File System] 2025/10/01 12:24:18 DEBUG - Write content
********************
ssh-rsa [REDACTED]
ssh-rsa [REDACTED]
********************
[File System] 2025/10/01 12:24:18 DEBUG - Chown /home/vcap/.ssh/authorized_keys to user vcap
[File System] 2025/10/01 12:24:18 DEBUG - Chmod /home/vcap/.ssh/authorized_keys to 384
[Cmd Runner] 2025/10/01 12:24:18 DEBUG - Running command 'usermod -p $6$YYvzvruHvG8hE.yR$RZ2Yr75.2lSbLgo/zF07E54zlwdxNrSmz4tz469316wP8It/nLadxqrORyTH5OzjforrReKc3CYCJzBe4XjsD. root'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stdout:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'usermod -p $6$YYvzvruHvG8hE.yR$RZ2Yr75.2lSbLgo/zF07E54zlwdxNrSmz4tz469316wP8It/nLadxqrORyTH5OzjforrReKc3CYCJzBe4XjsD. vcap'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stdout:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[File System] 2025/10/01 12:24:19 DEBUG - Glob '/sys/class/net/*'
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /sys/class/net/eth0/device
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/sys/class/net/eth0/device'
[File System] 2025/10/01 12:24:19 DEBUG - Reading file /sys/class/net/eth0/ifalias
[File System] 2025/10/01 12:24:19 DEBUG - Read content
********************
********************
[File System] 2025/10/01 12:24:19 DEBUG - Reading file /sys/class/net/eth0/address
[File System] 2025/10/01 12:24:19 DEBUG - Read content
********************
02:91:68:4b:ab:69
********************
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /sys/class/net/lo/device
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/sys/class/net/lo/device'
[File System] 2025/10/01 12:24:19 DEBUG - Reading file /sys/class/net/lo/ifalias
[File System] 2025/10/01 12:24:19 DEBUG - Read content
********************
********************
[interfaceConfigurationCreator] 2025/10/01 12:24:19 DEBUG - Creating network configuration with settings: type: 'manual', ip: '10.0.16.22', netmask: '255.255.240.0', gateway: '10.0.16.1', mac: '02:91:68:4b:ab:69', resolved: 'true', preconfigured: 'false', use_dhcp: 'true'
[interfaceConfigurationCreator] 2025/10/01 12:24:19 DEBUG - Using dhcp networking
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/etc/systemd/network/10_eth0.network'
[File System] 2025/10/01 12:24:19 DEBUG - Skipping writing /etc/systemd/network/10_eth0.network because contents are identical
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/etc/dhcp/dhclient.conf'
[File System] 2025/10/01 12:24:19 DEBUG - Skipping writing /etc/dhcp/dhclient.conf because contents are identical
[attemptRetryStrategy] 2025/10/01 12:24:19 DEBUG - Making attempt #0 for ip.InterfaceAddressesValidator
[UbuntuNetManager] 2025/10/01 12:24:19 INFO - Successfully set up outgoing nats api firewall
[linuxPlatform] 2025/10/01 12:24:19 INFO - Setting up raw ephemeral disks
[virtioDevicePathResolver] 2025/10/01 12:24:19 DEBUG - Failed to get device real path by disk ID: ''. Error: 'Disk ID is not set', timeout: 'false'
[virtioDevicePathResolver] 2025/10/01 12:24:19 DEBUG - Using mapped resolver to get device real path
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /dev/xvdb
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/dev/xvdb'
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /dev/vdb
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/dev/vdb'
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /dev/sdb
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/dev/sdb'
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /dev/nvme1n1
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/dev/nvme1n1'
[linuxPlatform] 2025/10/01 12:24:19 INFO - Setting up ephemeral disk...
[File System] 2025/10/01 12:24:19 DEBUG - Glob '/var/vcap/data/*'
[File System] 2025/10/01 12:24:19 DEBUG - Making dir /var/vcap/data with perm 0750
[linuxPlatform] 2025/10/01 12:24:19 INFO - Creating swap & ephemeral partitions on ephemeral disk...
[linuxPlatform] 2025/10/01 12:24:19 DEBUG - Getting device size of `/dev/nvme1n1'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'lsblk --nodeps -nb -o SIZE /dev/nvme1n1'
[arping] 2025/10/01 12:24:19 DEBUG - Broadcasting MAC addresses
[File System] 2025/10/01 12:24:19 DEBUG - Checking if file exists /sys/class/net/eth0
[File System] 2025/10/01 12:24:19 DEBUG - Stat '/sys/class/net/eth0'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'arping -c 1 -U -I eth0 10.0.16.22'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stdout: 5368709120
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[linuxPlatform] 2025/10/01 12:24:19 DEBUG - Calculating partition sizes of `/dev/nvme1n1', with available size 5368709120B
[linuxPlatform] 2025/10/01 12:24:19 INFO - Partitioning `/dev/nvme1n1' with [[Type: swap, SizeInBytes: 2684354560] [Type: linux, SizeInBytes: 2684354560]]
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'parted -m /dev/nvme1n1 unit B print'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stdout: BYT;
/dev/nvme1n1:5368709120B:nvme:512:512:gpt:Amazon Elastic Block Store:;
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'udevadm settle'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stdout:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[attemptRetryStrategy] 2025/10/01 12:24:19 DEBUG - Making attempt #0
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'wipefs --force -a /dev/nvme1n1'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stdout: /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/nvme1n1: 8 bytes were erased at offset 0x13ffffe00 (gpt): 45 46 49 20 50 41 52 54
/dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[PartedPartitioner] 2025/10/01 12:24:19 INFO - Successfully removed device path `/dev/nvme1n1'
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'parted -m /dev/nvme1n1 unit B print'
BYT;
/dev/nvme1n1:5368709120B:nvme:512:512:unknown:Amazon Elastic Block Store:;
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Stderr: Error: /dev/nvme1n1: unrecognised disk label
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Successful: true (0)
[EphemeralDevicePartitioner] 2025/10/01 12:24:19 DEBUG - Creating gpt table
[Cmd Runner] 2025/10/01 12:24:19 DEBUG - Running command 'parted -s /dev/nvme1n1 mklabel gpt'
[Cmd Runner] 2025/10/01 12:24:20 DEBUG - Stdout: ARPING 10.0.16.22 from 10.0.16.22 eth0
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
[Cmd Runner] 2025/10/01 12:24:20 DEBUG - Stderr:
[Cmd Runner] 2025/10/01 12:24:20 DEBUG - Successful: true (0)
[Cmd Runner] 2025/10/01 12:24:21 DEBUG - Stdout:
[Cmd Runner] 2025/10/01 12:24:21 DEBUG - Stderr: Error: Partition(s) 1, 2 on /dev/nvme1n1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[Cmd Runner] 2025/10/01 12:24:21 DEBUG - Successful: false (1)
[main] 2025/10/01 12:24:21 ERROR - App setup Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk '/dev/nvme1n1': Ensuring GPT table of `/dev/nvme1n1': Parted making label: Running command: 'parted -s /dev/nvme1n1 mklabel gpt', stdout: '', stderr: 'Error: Partition(s) 1, 2 on /dev/nvme1n1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
': exit status 1
[main] 2025/10/01 12:24:21 ERROR - Agent exited with error: Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk '/dev/nvme1n1': Ensuring GPT table of `/dev/nvme1n1': Parted making label: Running command: 'parted -s /dev/nvme1n1 mklabel gpt', stdout: '', stderr: 'Error: Partition(s) 1, 2 on /dev/nvme1n1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
': exit status 1
We try to restart the compilation vm but it never came back up again. BR, Aftab
I have a question after reading this. Based on the fact that the agent was not running and agent is a systemd unit I wonder, on the pre-start stage:
- Is the systemd unit enabled
- What is the status that systemctl gives
- What is the status of all dependencies of bosh agent. If its not running is the output of
systemctl status - How does the Mount table and list of devices/partitions in
lsblklook and iflsoffinds something. - are there kernel complaints about the agent?
journalctl -kfor dmesg from kernel orjournalctl -r -b -g "agent"I would expect the kernel to complain if you try to do a switcheroo on the file system.
🔍 Additional Analysis: Ephemeral Disk Partitioning Failures
I've been investigating the intermittent failures on Noble stemcells and noticed some recent comments mention partition table errors during bootstrap. This could be related to your timeout issues.
📋 Error Pattern Observed
Error: Partition(s) 1, 2 on /dev/nvme1n1 have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
🔎 Root Cause Analysis
This error occurs during the bosh-agent bootstrap phase when attempting to create a GPT partition table on the ephemeral disk. Here's what happens:
The Race Condition:
- Agent wipes partition signatures using
wipefs --force -a /dev/nvme1n1(with retry logic - up to 60 seconds) - Systemd/udev reacts to the device change and triggers background probes
- Agent runs
parted -s /dev/nvme1n1 mklabel gptto create new GPT - Parted writes GPT to disk ✅ (succeeds)
- Parted tries to inform kernel using
BLKPGioctl to update partition table - Kernel returns EBUSY ❌ - Old partitions still referenced
- Parted retries internally for 1 second (hardcoded in parted source)
- Parted gives up - Returns the error seen above
Why This Matters:
From the parted source code, the BLKPG ioctl fails with EBUSY when:
- Device mapper holds references
- Udev/systemd is actively probing the device
- Previous partition table is still cached in kernel (most likely cause)
Noble-Specific Factors:
- Systemd 255 - More aggressive device probing
- Udev rule changes - Different timing for device settle events
- NVMe driver updates - Partition detection behavior changes
- AWS NVMe specifics - Faster device attachment and different namespace semantics
🎯 Critical Gap in Agent Code
The agent has retry logic for wipefs (platform/disk/sfdisk_partition_strategy.go:29-45) - 20 attempts × 3 seconds = up to 60 seconds
But no retry logic for parted mklabel (platform/disk/ephemeral_device_partitioner.go:88-110) - Fails immediately if EBUSY
📝 Request for Additional Logs
To confirm this hypothesis and identify the root cause, could you provide the earliest agent logs from a failed Noble VM? Specifically:
- Full agent logs from VM bootstrap (from
/var/vcap/bosh/log/currentor journalctl output) - Time range: From when the agent starts until the first partition error
- What to look for:
- Are there multiple partition errors before the first timeout?
- Is udev actively probing devices during the partition operation?
- Are there other device-related errors?
Why this is important: The partition table error might be the actual root cause of the timeout errors. If the ephemeral disk setup fails during bootstrap, the agent might be unstable during subsequent operations (like long compilations), leading to ping timeouts.
The very first agent failure log will likely reveal whether this is:
- A timing issue (udev race condition)
- A kernel/driver issue (NVMe partition update semantics)
- A systemd issue (device probing interference)
This will help determine the best path forward for fixing the intermittent failures on Noble stemcells.
He , I have tried to reproduced this error again on bbl env with vanilla cf but this time without disk encryption.
# This was empty file for the failed compilation vm!
current -> /var/log/bosh-agent.log
Though, I was able to get the logs from the journalctl output. Therefore I have attached it here, but first few logs are:
/:~# journalctl -u bosh-agent
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Glob '/sys/class/net/*'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/eth0/device
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/eth0/device'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/eth0/ifalias
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/eth0/address
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: 02:3c:85:1d:e7:c5
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/lo/device
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/lo/device'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/lo/ifalias
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /etc/systemd/network/10_eth0.network
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/etc/systemd/network/10_eth0.network'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [httpMetadataService] 2025/10/17 11:06:56 DEBUG - Using IMDSv2 with endpoint: /latest/api/token
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [httpClient] 2025/10/17 11:06:56 DEBUG - Sending PUT request to endpoint 'http://169.254.169.254/latest/api/token'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [attemptRetryStrategy] 2025/10/17 11:06:56 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [clientRetryable] 2025/10/17 11:06:56 DEBUG - [requestID=52affe76-f1cc-458c-740e-f4c674b491a5] Requesting (attempt=1): Request{ Met>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [httpClient] 2025/10/17 11:06:56 DEBUG - Sending GET request to endpoint 'http://169.254.169.254/latest/meta-data/public-keys/0/ope>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [attemptRetryStrategy] 2025/10/17 11:06:56 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [clientRetryable] 2025/10/17 11:06:56 DEBUG - [requestID=434e574b-6305-481e-63b6-6b8978cf5bab] Requesting (attempt=1): Request{ Met>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Getting HomeDir for vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - HomeDir is /home/vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Making dir /home/vcap/.ssh with perm 0700
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Chown /home/vcap/.ssh to user vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Writing /home/vcap/.ssh/authorized_keys
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Making dir /home/vcap/.ssh with perm 0777
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Write content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDEWf/u+Rg2fIiCvVAYxQAMA6FXM45oCPjOIax48D4x1VaCFkLuiBkRIeQXIDKYZyKd4FcMUMI014yhjNiqCfsszsFRkba>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Chown /home/vcap/.ssh/authorized_keys to user vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Chmod /home/vcap/.ssh/authorized_keys to 384
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [settingsService] 2025/10/17 11:06:56 DEBUG - Loading settings from fetcher
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Glob '/sys/class/net/*'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/eth0/device
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/eth0/device'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/eth0/ifalias
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/eth0/address
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: 02:3c:85:1d:e7:c5
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/lo/device
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/lo/device'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/lo/ifalias
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /etc/systemd/network/10_eth0.network
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/etc/systemd/network/10_eth0.network'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [httpMetadataService] 2025/10/17 11:06:56 DEBUG - Using IMDSv2 with endpoint: /latest/api/token
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [httpClient] 2025/10/17 11:06:56 DEBUG - Sending PUT request to endpoint 'http://169.254.169.254/latest/api/token'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [attemptRetryStrategy] 2025/10/17 11:06:56 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [clientRetryable] 2025/10/17 11:06:56 DEBUG - [requestID=43b3ba7f-e801-4fe5-71de-a1c1ba623cce] Requesting (attempt=1): Request{ Met>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [httpClient] 2025/10/17 11:06:56 DEBUG - Sending GET request to endpoint 'http://169.254.169.254/latest/user-data'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [attemptRetryStrategy] 2025/10/17 11:06:56 DEBUG - Making attempt #0 for *httpclient.RequestRetryable
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [clientRetryable] 2025/10/17 11:06:56 DEBUG - [requestID=af006246-eaac-46a3-57c0-0efe5aa18139] Requesting (attempt=1): Request{ Met>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [settingsService] 2025/10/17 11:06:56 DEBUG - Successfully received settings from fetcher
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /var/vcap/bosh/update_settings.json
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/var/vcap/bosh/update_settings.json'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Making dir /var/vcap/bosh with perm 0777
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'ip r'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Getting HomeDir for vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - HomeDir is /home/vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Making dir /home/vcap/.ssh with perm 0700
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Chown /home/vcap/.ssh to user vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Writing /home/vcap/.ssh/authorized_keys
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Making dir /home/vcap/.ssh with perm 0777
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Write content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDM5Z2n4gQRlkLk8nQdDvgrJNJcsSq4qOIWjNZizq1DgEfSuRZWtLmxhiV5yBdGGQ6cms8KOLqJbngmLUMmiUnSAVi3zzb>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDEWf/u+Rg2fIiCvVAYxQAMA6FXM45oCPjOIax48D4x1VaCFkLuiBkRIeQXIDKYZyKd4FcMUMI014yhjNiqCfsszsFRkba>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Chown /home/vcap/.ssh/authorized_keys to user vcap
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Chmod /home/vcap/.ssh/authorized_keys to 384
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'usermod -p $6$YYvzvruHvG8hE.yR$RZ2Yr75.2lSbLgo/zF07E54zlwdxNrSmz4tz469316>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a usermod[922854]: change user 'root' password
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stdout:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'usermod -p $6$YYvzvruHvG8hE.yR$RZ2Yr75.2lSbLgo/zF07E54zlwdxNrSmz4tz469316>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a usermod[922861]: change user 'vcap' password
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stdout:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Glob '/sys/class/net/*'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/eth0/device
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/eth0/device'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/eth0/ifalias
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/eth0/address
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: 02:3c:85:1d:e7:c5
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/lo/device
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/lo/device'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Reading file /sys/class/net/lo/ifalias
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Read content
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ********************
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [interfaceConfigurationCreator] 2025/10/17 11:06:56 DEBUG - Creating network configuration with settings: type: 'manual', ip: '10.0>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [interfaceConfigurationCreator] 2025/10/17 11:06:56 DEBUG - Using dhcp networking
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/etc/systemd/network/10_eth0.network'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Skipping writing /etc/systemd/network/10_eth0.network because contents are identical
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/etc/dhcp/dhclient.conf'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Skipping writing /etc/dhcp/dhclient.conf because contents are identical
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [attemptRetryStrategy] 2025/10/17 11:06:56 DEBUG - Making attempt #0 for ip.InterfaceAddressesValidator
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [UbuntuNetManager] 2025/10/17 11:06:56 INFO - Successfully set up outgoing nats api firewall
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [linuxPlatform] 2025/10/17 11:06:56 INFO - Setting up raw ephemeral disks
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [virtioDevicePathResolver] 2025/10/17 11:06:56 DEBUG - Failed to get device real path by disk ID: ''. Error: 'Disk ID is not set', >
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [virtioDevicePathResolver] 2025/10/17 11:06:56 DEBUG - Using mapped resolver to get device real path
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /dev/xvdb
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/dev/xvdb'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /dev/vdb
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/dev/vdb'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /dev/sdb
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/dev/sdb'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /dev/nvme1n1
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/dev/nvme1n1'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [linuxPlatform] 2025/10/17 11:06:56 INFO - Setting up ephemeral disk...
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Glob '/var/vcap/data/*'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Making dir /var/vcap/data with perm 0750
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [linuxPlatform] 2025/10/17 11:06:56 INFO - Creating swap & ephemeral partitions on ephemeral disk...
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [linuxPlatform] 2025/10/17 11:06:56 DEBUG - Getting device size of `/dev/nvme1n1'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [arping] 2025/10/17 11:06:56 DEBUG - Broadcasting MAC addresses
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'lsblk --nodeps -nb -o SIZE /dev/nvme1n1'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Checking if file exists /sys/class/net/eth0
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [File System] 2025/10/17 11:06:56 DEBUG - Stat '/sys/class/net/eth0'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'arping -c 1 -U -I eth0 10.0.16.20'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stdout: 5368709120
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [linuxPlatform] 2025/10/17 11:06:56 DEBUG - Calculating partition sizes of `/dev/nvme1n1', with available size 5368709120B
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [linuxPlatform] 2025/10/17 11:06:56 INFO - Partitioning `/dev/nvme1n1' with [[Type: swap, SizeInBytes: 2684354560] [Type: linux, Si>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'parted -m /dev/nvme1n1 unit B print'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stdout: BYT;
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: /dev/nvme1n1:5368709120B:nvme:512:512:gpt:Amazon Elastic Block Store:;
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'udevadm settle'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stdout:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [attemptRetryStrategy] 2025/10/17 11:06:56 DEBUG - Making attempt #0
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'wipefs --force -a /dev/nvme1n1'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stdout: /dev/nvme1n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52>
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: /dev/nvme1n1: 8 bytes were erased at offset 0x13ffffe00 (gpt): 45 46 49 20 50 41 52 54
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: /dev/nvme1n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr:
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [PartedPartitioner] 2025/10/17 11:06:56 INFO - Successfully removed device path `/dev/nvme1n1'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'parted -m /dev/nvme1n1 unit B print'
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [129B blob data]
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: /dev/nvme1n1:5368709120B:nvme:512:512:unknown:Amazon Elastic Block Store:;
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Stderr: Error: /dev/nvme1n1: unrecognised disk label
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Successful: true (0)
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [EphemeralDevicePartitioner] 2025/10/17 11:06:56 DEBUG - Creating gpt table
Oct 17 11:06:56 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:56 DEBUG - Running command 'parted -s /dev/nvme1n1 mklabel gpt'
Oct 17 11:06:57 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:57 DEBUG - Stdout: ARPING 10.0.16.20 from 10.0.16.20 eth0
Oct 17 11:06:57 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: Sent 1 probes (1 broadcast(s))
Oct 17 11:06:57 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: Received 0 response(s)
Oct 17 11:06:57 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:57 DEBUG - Stderr:
Oct 17 11:06:57 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:57 DEBUG - Successful: true (0)
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:58 DEBUG - Stdout:
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:58 DEBUG - Stderr: Error: Partition(s) 1, 2 on /dev/nvme1n1 have been written, but we have been unabl>
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [Cmd Runner] 2025/10/17 11:06:58 DEBUG - Successful: false (1)
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [main] 2025/10/17 11:06:58 ERROR - App setup Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitionin>
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ': exit status 1
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: [main] 2025/10/17 11:06:58 ERROR - Agent exited with error: Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral di>
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a bosh-agent[922820]: ': exit status 1
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a systemd[1]: bosh-agent.service: Main process exited, code=exited, status=1/FAILURE
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a systemd[1]: bosh-agent.service: Failed with result 'exit-code'.
Oct 17 11:06:58 eefc6c8f-56ce-479e-8255-24ce2c53fa1a systemd[1]: bosh-agent.service: Scheduled restart job, restart counter is at 4191.
🔍 Analysis Update: Partition Table Race Condition on Noble Stemcells
Thank you @iaftab-alam for providing the detailed agent logs. I've completed a thorough analysis of the 192K-line log file and found definitive evidence of the root cause.
📊 Quantitative Findings
From the agent logs spanning ~33 minutes:
- 790 restart attempts (restart counters 495 → 1,284)
- 2,370 kernel notification failures across all attempts
- ~2.5 second average interval between restarts
- 100% failure rate - not a single successful recovery
- 0 wipefs failures - existing retry logic works perfectly
🐛 Confirmed Failure Pattern
Every single restart follows this exact sequence:
1. Agent starts → parted print (check state)
2. wipefs --force -a /dev/nvme1n1 ← SUCCESS ✅
3. parted print (verify wipefs)
4. parted -s /dev/nvme1n1 mklabel gpt ← FAILURE ❌ (~2s later)
└─ "unable to inform the kernel of the change, probably because it/they are in use"
5. Agent exits with error
6. Systemd restarts agent (2-3 seconds later)
7. Loop repeats indefinitely
Sample timing from logs:
08:32:56 - wipefs + parted (attempt 495) → 08:32:58 FAILURE
08:32:59 - wipefs + parted (attempt 496) → 08:33:01 FAILURE
08:33:01 - wipefs + parted (attempt 497) → 08:33:03 FAILURE
[... continues for 790 attempts over 33 minutes ...]
💡 Root Cause Analysis
The Race Condition:
- Wipefs succeeds - Agent wipes partition table with retry logic (up to 60 seconds total)
- Systemd/udev reacts - Background device probing begins
- Agent immediately calls parted - No delay, no retry logic
- Parted writes GPT but kernel notification fails with
EBUSY(device in use by udev) - Parted's internal retry expires (~1 second timeout, insufficient for udev to settle)
- Agent exits, systemd restarts in 2-3 seconds
- Device never settles → infinite loop
Why Noble is Affected (vs Jammy):
| Factor | Jammy (22.04) ✅ | Noble (24.04) ❌ |
|---|---|---|
| systemd version | 249 (slower device probing) | 255 (aggressive probing) |
| udev rules | Less reactive | More reactive to changes |
| Timing window | Parted's 1s retry succeeds | Parted's 1s retry fails |
| Result | No retry needed | Systematic failure |
🔎 Critical Gap Identified
The bosh-agent has inconsistent retry logic for partition operations:
- wipefs operation: Has retry logic (20 attempts × 3s = 60 second window)
- parted mklabel operation: No retry logic - fails immediately on EBUSY
The problem is not parted itself (it works correctly on Jammy), but rather that Noble's systemd 255 + updated udev rules create a timing window that exceeds parted's internal 1-second retry timeout. The agent needs to retry the entire mklabel operation, just like it retries wipefs.
📝 Request for Additional Context
To complete the investigation, could you provide agent logs from the very first failure on a Noble VM? The logs shared so far show the VM already in a restart loop (counter 495+), but the initial failure might reveal:
- What triggered the first partition operation?
- Were there any successful boots before the first failure?
- Are there other errors that preceded the partition issue?
What to look for:
- Agent logs from VM bootstrap (before restart counter > 10)
- Time range: From initial agent start until first partition error
- Kernel logs if available (journalctl -k) to see udev activity
🎯 Connection to Ping Timeouts
This partition failure during bootstrap likely explains the intermittent ping timeout errors you reported. If the ephemeral disk setup never completes:
- Agent enters infinite restart loop
- Compilation VMs never become ready
- Director times out waiting for VM to respond (600 seconds)
- Even if agent briefly responds, unstable disk state during long compilations may cause failures
The partition table error is likely the actual root cause, not just a symptom.
Next steps: Once we confirm the hypothesis with the initial failure logs, we can discuss the appropriate fix approach.
untested first draft of parted retry logic and wiping partition entries before wipefs in PR https://github.com/cloudfoundry/bosh-agent/tree/partx-parted
Hey @rkoster Well getting the first ever logs are little bit tricky. It gets cleaned up and not available anymore.
Yeah I agree, @rubionic is a bit unreasonable here 😞