Failed on upgrading BOSH Director from v271.2.0 to v280.0.14
Describe the bug Failed on upgrading BOSH Director from v271.2.0 to v280.0.14
To Reproduce Steps to reproduce the behavior (example): Deploy a bosh director v271.2.0 on vSphere:
$ ./create-env.sh sandbox-cfar 271.2.0
Deployment manifest: '/SANDBOX-CFAR/bosh-director/bosh-deployment-271.2.0/bosh.yml'
Deployment state: '/SANDBOX-CFAR/bosh-director/sandbox-cfar-state.json'
Started validating
Downloading release 'bosh'... Skipped [Found in local cache] (00:00:00)
Validating release 'bosh'... Finished (00:00:03)
Downloading release 'bpm'... Finished (00:00:03)
Validating release 'bpm'... Finished (00:00:02)
Downloading release 'bosh-vsphere-cpi'... Finished (00:00:00)
Validating release 'bosh-vsphere-cpi'... Finished (00:00:01)
Downloading release 'uaa'... Finished (00:00:09)
Validating release 'uaa'... Finished (00:00:05)
Downloading release 'credhub'... Finished (00:00:03)
Validating release 'credhub'... Finished (00:00:02)
Downloading release 'os-conf'... Finished (00:00:00)
Validating release 'os-conf'... Finished (00:00:00)
Downloading release 'backup-and-restore-sdk'... Finished (00:00:05)
Validating release 'backup-and-restore-sdk'... Finished (00:00:09)
Validating cpi release... Finished (00:00:00)
Validating deployment manifest... Finished (00:00:00)
Downloading stemcell... Finished (00:00:12)
Validating stemcell... Finished (00:00:05)
Finished validating (00:01:26)
Started installing CPI
Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:04:19)
Compiling package 'vsphere_cpi/5dffb632edb799be8e2c7aeed263409627b201d6143ce427621f40d6dd461993'... Finished (00:01:53)
Compiling package 'iso9660wrap/b9eee11ca7251f93ef853db345596783012ae26b5d6ec5cb3d29bf295899c973'... Finished (00:00:00)
Installing packages... Finished (00:00:01)
Rendering job templates... Finished (00:00:00)
Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:06:15)
Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-bionic-go_agent/1.92'... Finished (00:01:26)
Started deploying
Creating VM for instance 'bosh/0' from stemcell 'sc-74133471-3d5c-4444-8ae0-1b749056bf79'... Finished (00:01:16)
Waiting for the agent on VM 'vm-2e30ee54-968d-4407-b0e2-0a2c448f6695' to be ready... Finished (00:00:10)
Creating disk... Finished (00:00:28)
Attaching disk 'disk-36f89546-442f-4600-b482-ed148588a756' to VM 'vm-2e30ee54-968d-4407-b0e2-0a2c448f6695'... Finished (00:01:08)
Rendering job templates... Finished (00:00:22)
Compiling package 'golang/7b633f7a140b41ef9427109d0f3032cf81445ead'... Finished (00:00:27)
Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:03:18)
Compiling package 'mysql/788d06685e1ea1d316759eeeb506782ec7f9302f8c21e2ff04cd4703579f0935'... Finished (00:00:46)
Compiling package 'libpq/ecbfa62322b4124f25372a19d68b83295b4d290503153667ec378e3196c45f69'... Finished (00:00:28)
Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:03:15)
Compiling package 'database-backup-restorer-boost/05f72399bdd8d91643f42ac411ba65befb78ac0334484dbc3ca95c5286ab7680'... Finished (00:00:19)
Compiling package 'tini/3d7b02f3eeb480b9581bec4a0096dab9ebdfa4bc'... Finished (00:00:02)
Compiling package 'bpm-runc/3dcaebacd63b8adc75c5f32954f11041885347b1'... Finished (00:01:47)
Compiling package 'openjdk_1.8.0/225f67373c9ad0a1da464aeb92f06207bd3e8da1'... Finished (00:00:08)
Compiling package 'golang-1-linux/7fdbb13e913f2f05232da046b27642ceebab32adf2e78ef3582b63ae6d60df96'... Finished (00:00:27)
Compiling package 'libpcre2/d5cd2e4263fda94bfeec68d2a388b9e6bb17fa15e28e09c99ebe6a4faa3328f5'... Finished (00:00:14)
Compiling package 'director/f32385256198535b797059dd4990fcb3b65c0c07337990163c24275a7a29b7e1'... Finished (00:01:25)
Compiling package 'verify_multidigest/64d1958934e10a0eccc05ddf0d7ba0c8215e6f6d4c227cb93998087335378fa8'... Finished (00:00:01)
Compiling package 'vsphere_cpi/5dffb632edb799be8e2c7aeed263409627b201d6143ce427621f40d6dd461993'... Finished (00:01:18)
Compiling package 'davcli/58f558960854f58c55e3d506d3906019178dbc189fbbed1616b8b3c7c02142ea'... Finished (00:00:01)
Compiling package 'gonats/f58980bd4b0436ff65f588627116dfff63f346f4d13175b7ba47380ab89e08a6'... Finished (00:00:01)
Compiling package 'database-backup-restorer-postgres-9.4/70d321821ff300fbaef47d64fb7f7b5d33ede23c2349cbf1950886c40f25c2e8'... Finished (00:04:36)
Compiling package 'database-backup-restorer-postgres-10/41f9bdf0c158e18e850a5744250a39b425f385529b234941c9acf1f6631a3424'... Finished (00:05:14)
Compiling package 'database-backup-restorer-mysql-5.7/81418214987edce3b03159014ac68449689086d696be746e14857f7551f8f3f6'... Finished (00:02:51)
Compiling package 'nginx/d4cf69d3e81bed005ebba5bc0bc8d2c28252e70ad47ff455479a9838d5f9b0e4'... Finished (00:01:02)
Compiling package 'database-backup-restorer-postgres-13/0c18508216826e03c23c623d2f1989405831375c9d457e0ac619125c32b15371'... Finished (00:06:01)
Compiling package 'database-backup-restorer-postgres-11/be5ee4b5015679ea4d92295ea1eb9a58480c3fff155f69cd1a92f800c11a0c91'... Finished (00:05:38)
Compiling package 'bpm/818bd9ec39fa5e179c5406c1690fb7c6deb0fc4d'... Finished (00:00:11)
Compiling package 'postgres-9.4/601f3635b43d0e7ba3ae866e3bd69425cdf33f7fb34a7f1bb21cc26818fb598e'... Finished (00:04:31)
Compiling package 'credhub/33ea568aad1d35e9522c56f792d3d4fc3cd5975d'... Finished (00:00:07)
Compiling package 's3cli/7e752dee192da026f6a0cdf2653b855cc6efbe6b041564660f8520c39ddd5a78'... Finished (00:00:02)
Compiling package 'health_monitor/dd842698e83edeae08bdcc6e672429a5cee3b755645d2024d97b6213f1281d44'... Finished (00:00:34)
Compiling package 'database-backup-restorer/7c0d80a713009aecb8d6533918a2bf45f7ad0319f50ecca1789fc230aa6d5dd9'... Finished (00:00:06)
Compiling package 'database-backup-restorer-mariadb/af78e79c98c11c29a721b1d7ba554dd7d0bf25e2789fa933b96bbfd67d697465'... Finished (00:02:12)
Compiling package 'luna-hsm-client-7.4/746f3c30aadc0af7afc2d5cddcc16d8836a8f845'... Finished (00:00:04)
Compiling package 'postgres-10/708f8446db4ac7bb21bddce9938e217c741a6e6f82f6209f7e6f6a2b5b25eed3'... Finished (00:05:05)
Compiling package 'bosh-gcscli/52223432539bbd0607db053f542440869688b4404dd65f2ddf33c2d195b1b891'... Finished (00:00:02)
Compiling package 'uaa/4f77a97610b962f50d0c21067b48bd467db6066855318c766af8bc1cb990e799'... Finished (00:00:35)
Compiling package 'iso9660wrap/b9eee11ca7251f93ef853db345596783012ae26b5d6ec5cb3d29bf295899c973'... Finished (00:00:01)
Compiling package 'database-backup-restorer-mysql-5.6/01bf18f19277261bcccac9736d7634b49eb184a93cd6549b78f4e1d75eabe35a'... Finished (00:02:14)
Compiling package 'database-backup-restorer-postgres-9.6/6a8fcf2d66b67507403df885b84c4b7cc1d66289f2d7efc5914b43dd2305491c'... Finished (00:05:07)
Updating instance 'bosh/0'... Finished (00:03:08)
Waiting for instance 'bosh/0' to be running... Finished (00:01:46)
Running the post-start scripts 'bosh/0'... Finished (00:00:21)
Finished deploying (01:09:07)
Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)
Succeeded
root@0036416c4de8:/SANDBOX-CFAR/bosh-director# . bosh-env.sh sandbox-cfar 271.2.0
root@0036416c4de8:/SANDBOX-CFAR/bosh-director# bosh env
Using environment '10.9.202.186' as client 'admin'
Name sandbox-cfar
UUID a234617f-6e58-462f-ac51-52c722c3834b
Version 271.2.0 (00000000)
Director Stemcell ubuntu-bionic/1.92
CPI vsphere_cpi
Features compiled_package_cache: disabled
config_server: enabled
local_dns: enabled
power_dns: disabled
snapshots: disabled
User admin
Succeeded
Upload stemcell ubuntu-bionic 1.92
Deploy cf-deployment 21.5.0.
Upgrade the current bosh director v271.2.0 to v280.0.14
$ ./create-env.sh sandbox-cfar 280.0.14
Deployment manifest: '/var/vcap/store/deployment-vm/home/ptran/workspace/SANDBOX-CFAR/bosh-director/bosh-deployment-280.0.14/bosh.yml'
Deployment state: '/var/vcap/store/deployment-vm/home/ptran/workspace/SANDBOX-CFAR/bosh-director/sandbox-cfar-state.json'
Started validating
Downloading release 'bosh'... Finished (00:00:01)
Validating release 'bosh'... Finished (00:00:01)
Downloading release 'bpm'... Finished (00:00:00)
Validating release 'bpm'... Finished (00:00:00)
Downloading release 'bosh-vsphere-cpi'... Finished (00:00:01)
Validating release 'bosh-vsphere-cpi'... Finished (00:00:02)
Downloading release 'uaa'... Finished (00:00:03)
Validating release 'uaa'... Finished (00:00:02)
Downloading release 'credhub'... Finished (00:00:01)
Validating release 'credhub'... Finished (00:00:01)
Downloading release 'os-conf'... Finished (00:00:00)
Validating release 'os-conf'... Finished (00:00:00)
Downloading release 'backup-and-restore-sdk'... Finished (00:00:04)
Validating release 'backup-and-restore-sdk'... Finished (00:00:03)
Validating cpi release... Finished (00:00:00)
Validating deployment manifest... Finished (00:00:00)
Downloading stemcell... Skipped [Found in local cache] (00:00:00)
Validating stemcell... Finished (00:00:12)
Finished validating (00:00:39)
Started installing CPI
Compiling package 'ruby-3.1/8b225e7cc2608305a7b784b5828b2b4b7c7adc3eb14af46e313d64a9e14a3ad6'... Finished (00:03:39)
Compiling package 'golang-1-darwin/e6383fc2adbcb1dc5ab18d32b737b1729ff3226b774a358504a44bc5d6bd097f'... Finished (00:00:23)
Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Finished (00:00:24)
Compiling package 'vsphere_cpi/54bcc7a48ba47cc7df2b8dd4704bc8dbb46b945b1a91cbc147262803557a6a7a'... Finished (00:00:35)
Compiling package 'iso9660wrap/b351c796826a0a3a57e13bad036c12a3958c38f9370bbb50540e782582baaf79'... Finished (00:00:31)
Installing packages... Finished (00:00:07)
Rendering job templates... Finished (00:00:00)
Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:05:41)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.340'... Skipped [Stemcell already uploaded] (00:00:00)
Started deploying
Waiting for the agent on VM 'vm-aef0966d-e843-41ff-873d-2acfe6ee88bb'... Finished (00:00:00)
Draining jobs on instance 'unknown/0'... Finished (00:00:07)
Stopping jobs on instance 'unknown/0'... Finished (00:00:00)
Unmounting disk 'disk-36f89546-442f-4600-b482-ed148588a756'... Finished (00:00:01)
Deleting VM 'vm-aef0966d-e843-41ff-873d-2acfe6ee88bb'... Finished (00:00:22)
Creating VM for instance 'bosh/0' from stemcell 'sc-74437d41-122f-4224-a3e1-6266ff62e4df'... Finished (00:00:58)
Waiting for the agent on VM 'vm-57d3af3a-29bf-4b39-944b-3bcb03d5a164' to be ready... Finished (00:00:29)
Attaching disk 'disk-36f89546-442f-4600-b482-ed148588a756' to VM 'vm-57d3af3a-29bf-4b39-944b-3bcb03d5a164'... Finished (00:00:40)
Rendering job templates... Finished (00:00:28)
Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Skipped [Package already compiled] (00:00:00)
Compiling package 'golang-1-darwin/e6383fc2adbcb1dc5ab18d32b737b1729ff3226b774a358504a44bc5d6bd097f'... Finished (00:00:36)
Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Finished (00:00:35)
Compiling package 'ruby-3.1/8b225e7cc2608305a7b784b5828b2b4b7c7adc3eb14af46e313d64a9e14a3ad6'... Finished (00:15:25)
Compiling package 'director-ruby-3.2/84ee2f9d0485530a75822fa03e7fd0c73544aa4c2f6fe24aaebebe1757195efe'... Skipped [Package already compiled] (00:00:00)
Compiling package 'tini/3d7b02f3eeb480b9581bec4a0096dab9ebdfa4bc'... Skipped [Package already compiled] (00:00:00)
Compiling package 'bpm-runc/923e2cae4f8f54cd58de0349352bb14f8662cfa5'... Skipped [Package already compiled] (00:00:00)
Compiling package 'libopenssl1/7f27f8cdc6cd6f6f865bfbe67ab853977e1505d2ca558415df9bf692eb1b0d63'... Skipped [Package already compiled] (00:00:00)
Compiling package 'openjdk_17.0/a805b67e0bbf99e97ca878960971301e56d951f67ab5ca14be11553b356556e8'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-boost/05f72399bdd8d91643f42ac411ba65befb78ac0334484dbc3ca95c5286ab7680'... Skipped [Package already compiled] (00:00:00)
Compiling package 'libpcre2/22fb4c5ee63919fa1e4b1e720fe048f8c55d8998858aeb8172ca67cbdcd0e6de'... Skipped [Package already compiled] (00:00:00)
Compiling package 'mysql/7ec79ca2b57047da0b337c62944439493b60c1bd5a2767444362cfd1c7b2bbd9'... Skipped [Package already compiled] (00:00:00)
Compiling package 'libpq/b309a72768019e24e2c592f3f25ded2679e98cbb90f774c3a4d6b7745760079f'... Skipped [Package already compiled] (00:00:00)
Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Skipped [Package already compiled] (00:00:00)
Compiling package 'postgres-15/1059ac62d543dc19011001f80f8c0bb99cc3a9ea4f8c14736e480701051ce9f0'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-postgres-15/162c4cca97dcfd5b12d4241bf40ae421cb3c4fbdbf215ce601f3267865501f66'... Skipped [Package already compiled] (00:00:00)
Compiling package 'luna-hsm-client-7.4/5956cbd4d17c28c2e4c29f3906e3faddc1d7b921708740f1a532a37d5b6fbe29'... Skipped [Package already compiled] (00:00:00)
Compiling package 'iso9660wrap/b351c796826a0a3a57e13bad036c12a3958c38f9370bbb50540e782582baaf79'... Finished (00:00:29)
Compiling package 'vsphere_cpi/54bcc7a48ba47cc7df2b8dd4704bc8dbb46b945b1a91cbc147262803557a6a7a'... Finished (00:01:07)
Compiling package 'database-backup-restorer-mysql-8.0/488fb8d45895a348f88ca2984fa36939687ad6978deebabd8ee70a1514776f17'... Skipped [Package already compiled] (00:00:00)
Compiling package 'nats/52d36e5308f7aeced172092016c0fd34f9195ff2788d3106fc2d5cf1ac192c1a'... Skipped [Package already compiled] (00:00:00)
Compiling package 'bpm/a37a126c1b31da99ab252f4668953a38c4748864'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-mysql-5.6/86603abfbb0d59ebf924449e97fecc422af66d7941bf5498a05099b653a8d3eb'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-postgres-13/ea27ff50286f247ab3acdb3c7cc2101c6d7a666a4eec7c669f7e34e3ef1b51e6'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-postgres-11/b9125bf430a1cf1d00ab83c72e4c5be26f6de52c5315b82beda286d31f4e7cc1'... Skipped [Package already compiled] (00:00:00)
Compiling package 'davcli/ca2605d13c62b479a215162ea17769326d6f7e37d1002c85816534013235b7d4'... Skipped [Package already compiled] (00:00:00)
Compiling package 'credhub/e3913a55fb5116fdca99c6403a19a94e7e051e4cd255ab972be279f86ef50de9'... Skipped [Package already compiled] (00:00:00)
Compiling package 'azure-storage-cli/90a54f4a65a0bfa7d1dc7c651467c1d1b19a009ccbb071ec4ccae42ba903c811'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-mysql-5.7/b1576d316b0046ec60cbbc3ef148eed266daca19992d5b228167a7dfb7059c34'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-mariadb/f66c894e04cf0b91155bf3a3c0af46ff3ce6957ea5f2c07112ba3ead4a185513'... Skipped [Package already compiled] (00:00:00)
Compiling package 'postgres-10/e3f2ed31116e1a0c929ae6fcdde983a9d6c000c25cafde8a784fd126e06400f9'... Skipped [Package already compiled] (00:00:00)
Compiling package 's3cli/93d30c08e76d18cf878007359b18c1d1c1c0fb92c757d06bb0bb09de60f2c765'... Skipped [Package already compiled] (00:00:00)
Compiling package 'verify_multidigest/ffa02c5cc46c56c8006a5c081a16e76b4353f99de7ccc1605c01a95ae47f2fbd'... Skipped [Package already compiled] (00:00:00)
Compiling package 'health_monitor/5a419aae8750e7fe3f368f6695f8c60fc7d80e8a547d542137d6fbf782cee7fa'... Skipped [Package already compiled] (00:00:00)
Compiling package 'director/31ce6b1831288b9080178caf68f40d7c59d0743b2f736b449aab842d199fbc4c'... Skipped [Package already compiled] (00:00:00)
Compiling package 'uaa/2210f02ea85373965968f01d0291a1208d4b6e2e85616a95b477a4354cb93674'... Skipped [Package already compiled] (00:00:03)
Compiling package 'nginx/82a22b536cf378d354f9325dadcbcb2fa70b1ce9e37eb65a8a7a97cd35e8fc45'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer/84b24a5d9b0a1c07b6484bf908700e2d7990b718e4fd2ce5ee4545337109df2f'... Skipped [Package already compiled] (00:00:00)
Compiling package 'bosh-gcscli/6394d55f449cad79d0f825815777c3f9f06efcae67850796e905e6aab7e9335b'... Skipped [Package already compiled] (00:00:00)
Compiling package 'postgres-13/a3141b9f3664abe145c6fb452a54b3bbc4b772933083c2c1ef725c0a7c71824f'... Skipped [Package already compiled] (00:00:00)
Compiling package 'database-backup-restorer-postgres-10/f4a7d1e2aaad5f2aabb6b0dcbcaedb49305f0d62373af72e2ee8f01eaa595be9'... Skipped [Package already compiled] (00:00:00)
Updating instance 'bosh/0'... Failed (00:04:49)
Failed deploying (00:26:41)
Cleaning up rendered CPI jobs... Finished (00:00:00)
Deploying:
Running the pre-start script:
Sending 'get_task' to the agent:
Agent responded with error: Action Failed get_task: Task 288aece3-c64b-4578-5bf7-c6a7c8058142 result: 1 of 8 pre-start scripts failed. Failed Jobs: postgres. Successful Jobs: blobstore, nats, bpm, director, user_add, credhub, uaa.
Exit code 1
The pre-start script of the postgres job failed.
Expected behavior BOSH Director should be successfully upgraded from v271.2.0 to v280.0.14
Logs When sshing into the BOSH Director VM, I found this error in /var/vcap/sys/log/postgres/pre-start.stdout.log:
bosh/0:~$ sudo -i
bosh/0:~# monit summary
/var/vcap/bosh/etc/monitrc:8: Warning: include files not found '/var/vcap/monit/job/*.monitrc'
The Monit daemon 5.2.5 uptime: 20m
System 'system_8c7a4cee-d163-4cd5-4d8c-cd2c5d15cd6f' running
bosh/0:~# ls /var/vcap/sys/log/postgres/ -hal
total 12K
drwxrwx--- 2 root vcap 4.0K Jan 27 05:44 .
drwxr-x--- 16 root vcap 4.0K Jan 27 05:44 ..
-rw-r----- 1 root root 0 Jan 27 05:44 pre-start.stderr.log
-rw-r----- 1 root root 283 Jan 27 05:44 pre-start.stdout.log
bosh/0:~# cat /var/vcap/sys/log/postgres/pre-start.stderr.log
bosh/0:~# cat /var/vcap/sys/log/postgres/pre-start.stdout.log
kernel.shmmax = 67108864
copying contents of postgres-10 to postgres-15 for postgres upgrade...
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
The source cluster was not shut down cleanly.
Failure, exiting
When BOSH Director is migrating the database from Postgres 10 to Postgres 15 during the upgrade, it's complaining about the source database (Postgres 10?) is not shutdown cleanly. I attempted to rerun the BOSH Director upgrade several times, but it did not help.
Versions (please complete the following information):
- Infrastructure: vSphere
- BOSH versions: from 271.2.0 to 280.0.14
- BOSH CLI version: $ bosh -v version 6.1.1-a0c78bc2-2019-10-25T22:16:25Z Succeeded
- Stemcell versions: ubuntu-bionic/1.92 for current BOSH Director v271.2.0 ubuntu-jammy/1.340 for new BOSH Director v280.0.14
- ... other versions of releases being used (BOSH DNS, Credhub, UAA, BPM, etc)
yq '.releases' releases-280.0.14/interpolated-bosh-director-280.0.14.yml
- name: bosh
sha1: f7fd9b040ab56b9c88dd6c4dfc23fdf682c7d4ad
url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/bosh-280.0.14-ubuntu-jammy-1.340-20240111-153544-517049233-20240111153545.tgz
version: 280.0.14
- name: bpm
sha1: 6ac7f9a016075ed69b6808dfb544146a73565a9f
url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/bpm-1.2.13-ubuntu-jammy-1.340-20240110-224040-652943252-20240110224041.tgz
version: 1.2.13
- name: bosh-vsphere-cpi
sha1: ddcf851983f672b1186590244d94f7dffb959ff2
url: https://bosh.io/d/github.com/cloudfoundry/bosh-vsphere-cpi-release?v=97.0.5
version: 97.0.5
- name: uaa
sha1: a8d7847cf4b5829bcfc085565dfb78697fbc3bb5
url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/uaa-76.31.0-ubuntu-jammy-1.340-20240119-145417-377757494-20240119145421.tgz
version: 76.31.0
- name: credhub
sha1: e9229b2bb5681f9ef8911e653e9719de628b3904
url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/credhub-2.12.58-ubuntu-jammy-1.340-20240111-190030-621523752-20240111190032.tgz
version: 2.12.58
- name: os-conf
sha1: daf34e35f1ac678ba05db3496c4226064b99b3e4
url: https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=22.2.1
version: 22.2.1
- name: backup-and-restore-sdk
sha1: 28ea9cbf00d89d4d4c363f4459d79268e44ac65f
url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/backup-and-restore-sdk-1.18.116-ubuntu-jammy-1.340-20240115-082356-879977937-20240115082400.tgz
version: 1.18.116
Deployment info: We're using "bosh create-env" command with bosh-deployment to create and upgrade BOSH Director environment. BOSH Director creation script:
#!/usr/bin/env bash
BIN_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"
if [[ $# -lt 2 ]]
then
echo "Usage: $0 <env_name> <bosh_director_version>" 1>&2
echo "Example: $0 sandbox-cfar 280.0.14" 1>&2
exit 1
fi
env_name=${1}
bosh_director_version=${2}
bosh create-env ${BIN_DIR}/bosh-deployment-${bosh_director_version}/bosh.yml \
--state=${BIN_DIR}/${env_name}-state.json \
--vars-store=${BIN_DIR}/${env_name}-creds.yml \
-l ${BIN_DIR}/${env_name}-vars-${bosh_director_version}.yml \
-o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/vsphere/cpi.yml \
-o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/uaa.yml \
-o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/credhub.yml \
-o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/jumpbox-user.yml \
-o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/bbr.yml \
-o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/experimental/enable-metrics.yml \
-o ${BIN_DIR}/ops/configure-uaa-ldap.yml \
-o ${BIN_DIR}/ops/change-uaa-login-prompt.yml \
-o ${BIN_DIR}/ops/map-ldap-to-uaa-groups.yml \
-o ${BIN_DIR}/ops/use-bosh-compiled-releases-from-artifactory-${bosh_director_version}.yml \
-o ${BIN_DIR}/ops/use-bosh-stemcell-from-artifactory-${bosh_director_version}.yml \
-o ${BIN_DIR}/ops/vsphere.yml \
-o ${BIN_DIR}/ops/dns.yml \
-o ${BIN_DIR}/ops/ntp.yml \
-o ${BIN_DIR}/ops/passwd.yml \
-o ${BIN_DIR}/ops/disk-pools.yml \
-o ${BIN_DIR}/ops/set-credhub-minimum-certificate-duration.yml
new bosh-deployment: https://github.com/cloudfoundry/bosh-deployment/tree/15cbd254db78ab49ef957f2d80ffd2901b09d6e5
Additional context Add any other context about the problem here.
It seems like you are upgrading from an ancient version of Postgres. This issue was fixed here: https://github.com/cloudfoundry/bpm-release/pull/152
Thank you so much for the response @rkoster! Indeed we're operating an "outdated" BOSH environment and have not done the upgrade regularly as we should. We have seen this issue intermittently on a few runs of BOSH Director upgrade testing.
How can we move forward with this BOSH Director v280.0.14 upgrade and ensure that this issue won't happen in our existing production BOSH environments?
Option 1: Can we first manually shut down Postgres 10 on the BOSH Director VM before attempting BOSH Director upgrade? If yes, which command sequences should be used to properly shut down Postgres 10 and other BOSH Director related services?
Option 2: First update BPM component to v1.1.14 or higher (https://github.com/cloudfoundry/bpm-release/pull/152#issuecomment-938235720) with the fix on current BOSH Director v271.2.0 before upgrading to BOSH Director v280.0.14.
Any other options? Greatly appreciate your suggestions here.
Updating BPM would still be an update of the instance, and as such have a change of an improper Postgres shutdown.
@bgandon do you remember if there was a workaround that was used before the fix was implemented?
Hi @bgandon, As @rkoster confirmed using Option 2 will likely run into the same improper Postgres shutdown. Could you please advice on the workaround you used before the BPM fix was implemented if it's possible?
We're thinking of using the Option 1 as a workaround for manually shutting down Postgres 10 on the BOSH Director VM before attempting BOSH Director upgrade. Please help to confirm if the following steps will work.
- SSH into BOSH Director VM.
- Monit stop all other processes except Postgres.
bosh/0:~# for name in "credhub" "uaa" "health_monitor" "director_nginx" "director_sync_dns" "director_scheduler" "blobstore_nginx" "nats" "director"; do monit stop "${name}"; done
bosh/0:~# monit summary
The Monit daemon 5.2.5 uptime: 7d 2h 19m
Process 'nats' not monitored
Process 'postgres' running
Process 'blobstore_nginx' not monitored
Process 'director' not monitored
Process 'worker_1' not monitored
Process 'worker_2' not monitored
Process 'worker_3' not monitored
Process 'worker_4' not monitored
Process 'director_scheduler' not monitored
Process 'director_sync_dns' not monitored
Process 'director_nginx' not monitored
Process 'health_monitor' not monitored
Process 'uaa' not monitored
Process 'credhub' not monitored
System 'system_be0914a6-1473-47f1-58d9-4f3aacbe2ab5' running
- Umonitor Postgres process, so monit won't restart it when Postgres is shutdown using "kill" command directly later.
bosh/0:~# monit unmonitor postgres
bosh/0:~# monit summary
The Monit daemon 5.2.5 uptime: 7d 2h 54m
Process 'nats' not monitored
Process 'postgres' not monitored
Process 'blobstore_nginx' not monitored
Process 'director' not monitored
Process 'worker_1' not monitored
Process 'worker_2' not monitored
Process 'worker_3' not monitored
Process 'worker_4' not monitored
Process 'director_scheduler' not monitored
Process 'director_sync_dns' not monitored
Process 'director_nginx' not monitored
Process 'health_monitor' not monitored
Process 'uaa' not monitored
Process 'credhub' not monitored
System 'system_be0914a6-1473-47f1-58d9-4f3aacbe2ab5' running
- Shutdown Postgres using "kill" command with SIGINT signal for fast mode shutdown.
bosh/0:~# postgres_pid=$(/var/vcap/packages/bpm/bin/bpm pid postgres-10) && kill -s SIGINT "${postgres_pid}"
- Check Postgres database cluster state and ensure it's been shutting down properly with "shut down" state instead of "in production"
bosh/0:~# su - vcap -c "/var/vcap/packages/postgres-10/bin/pg_controldata -D /var/vcap/store/postgres-10" | grep -F "Database cluster state"
Database cluster state: shut down
- If Postgres database cluster state is in "shut down", then exit the BOSH Director VM and proceed with the BOSH Director upgrade as usual.