concourse-bosh-deployment
concourse-bosh-deployment copied to clipboard
BBR backup CredHub fails due to hostname verification failure with internal postgres server's certificate
Steps to reproduce:
- Deploy concourse with internal postgres server:
bosh -d my-deployment deploy ./cluster/concourse.yml \
-l variables.yml \
-l versions.yml \
-o ./cluster/operations/basic-auth.yml \
-o ./cluster/operations/backup-atc-colocated-web.yml \
-o ./cluster/operations/tls-vars.yml \
-o ./cluster/operations/tls.yml \
-o ./cluster/operations/privileged-https.yml \
-o ./cluster/operations/uaa.yml \
-o ./cluster/operations/credhub-colocated.yml \
-o ./cluster/operations/secure-internal-postgres.yml \
-o ./cluster/operations/secure-internal-postgres-uaa.yml \
-o ./cluster/operations/secure-internal-postgres-bbr.yml \
-o ./cluster/operations/secure-internal-postgres-credhub.yml \
-o ./cluster/operations/backup-credhub-web.yml
- Run bbr:
bbr deployment \
--target SOME-TARGET-IP \
--deployment my-deployment \
--username bbr_client \
--password MY-PASSWORD \
--ca-cert root_ca_certificate \
backup
- See that BBR of credhub failed with an error like this:
psql: server certificate for "q-s0.db.concourse.concourse.bosh" does not match host name "192.168.1.152".
And see that CredHub produces a bbr config json that looks like (where the host is an IP address):
{
"username": "credhub",
"password": "xxxx",
"database": "credhub",
"adapter": "postgres",
"host": "192.168.1.152",
"port": 5432,
"tls": {
"cert": {
"ca": "-----BEGIN CERTIFICATE-----xxxxE-----\n"
}
}
}
Diagnosis
- The internal
postgres
server will be deployed with a certificate generated by the config specified in cluster/operations/secure-internal-postgres.yml. We verified that the generated server cert will have the DNS address as its Common Name and SAN:
Common Name: q-s0.db.infra.concourse-colocated.bosh
Subject Alternative Names: q-s0.db.infra.concourse-colocated.bosh
- CredHub BBR job is generating the bbr config json by accessing a bosh link of type
database
provided by the internalpostgres
job. CredHub BBR job accesses it viaDATABASE-LINK.instances[0].address
. - Depending on the bosh director version, this
DATABASE-LINK.instances[0].address
may return an IP address (eg:192.168.1.152
) or a DNS address (eg:q-s0.db.concourse.concourse.bosh
) (See bosh link doc and DNS link). In this case,DATABASE-LINK.instances[0].address
returns an IP address. - BBR receives a bbr config json that has the DB IP address, and reaches out to the IP address with hostname verification turned on (see psql doc's explanation of the
verify-full
sslmode). But since the postgres server cert only has the DNS address (q-s0.db.infra.concourse-colocated.bosh
) as its Common Name & SAN. The hostname verification fails.
Related
Related issue reported by other here
Potential Fixes:
- Integrate or document the
features.use_dns_addresses
deployment manifest property (see doc) so thatSOME-DB-LINK.instances[0].address
returns a DNS address. - (If possible) configure BBR to disable hostname verification when talking to an internal postgres server. See psql doc's explanation that the
verify-full
sslmode, which includes the hostname verification, is not required when using a local CA or self-signed certs - Update cluster/operations/secure-internal-postgres.yml such that the internal postgres server's cert would have both its DNS address and its IP address as the cert's SAN, then the hostname verification will work regardless if BRR is talking to the postgres server using DNS address or IP address.
cc: @bruce-ricard
Hey @peterhaochen47 ,
I was able to find a Workaround, see my notes below on Potential Fix 1) and 2).
Potential Fix 1)
This suggestion did not work for me. After adding
features:
use_dns_addresses: true
to my concourse manifest, re-deploying, and performing the backup - I received the following error:
In the above screenshot, the BOSH DNS Name is used. However, the Full DNS name is returned. Unfortunately, it still does not match the Common Name that is used on the postgres server certificate.
There is another property in the bosh docs for features.use_short_dns_addresses
that states this property is used for certificate common names, so I tried that. However, this also did not work.
When adding this property to the manifest and re-deploying Concourse, the deployment fails when it starting the uaa
Job on the Web VM with the following error:
Potential Fix 2)
I was able to get Potential Fix 2) working using the bbr-postgres-db Job from the postgres bosh release
The bbr-postgres-db job has a property to disable SSL Hostname Verfication:
It appears there is no option that exists for this in the bbr-credhubdb job, which is why the backup fails with the hostname verification error when consuming this job from the credhub release
Below are the steps I implemented:
- Created a new operations file called
backup-postgres.yml
with the following config:
# Add release for backup-and-restore-sdk
- type: replace
path: /releases/name=backup-and-restore-sdk?
value:
name: backup-and-restore-sdk
version: ((bbr_sdk_version))
url: https://bosh.io/d/github.com/cloudfoundry-incubator/backup-and-restore-sdk-release?v=((bbr_sdk_version))
# Add the database-backup-restore job to the db VM
- type: replace
path: /instance_groups/name=db/jobs/-
value:
release: backup-and-restore-sdk
name: database-backup-restorer
# Add the bbr-postgres-db job to enable BBR backups for the postgres db
# NOTE: When TLS is enabled for postgres, the BBR backup fails with a hostname verification error
# Therefore, we must set the ssl_verify_hostname property to FALSE (Sets to TRUE by default)
#---------------------------------
- type: replace
path: /instance_groups/name=db/jobs/-
value:
name: bbr-postgres-db
release: postgres
properties:
postgres:
databases:
- credhub
- atc
- uaa
ssl_verify_hostname: false
- Removed the
backup-atc.yml
from my deploy script and added my newly createdbackup-postgres.yml
ops file:
#!/bin/bash
bosh deploy \
-d concourse ./cluster/concourse.yml \
-l vars.yml \
-l versions.yml \
-o ./cluster/operations/basic-auth.yml \
-o ./cluster/operations/privileged-https.yml \
-o ./cluster/operations/tls.yml \
-o ./cluster/operations/encryption.yml \
-o ./cluster/operations/uaa.yml \
-o ./cluster/operations/credhub-colocated.yml \
-o ./cluster/operations/secure-internal-postgres.yml \
-o ./cluster/operations/secure-internal-postgres-bbr.yml \
-o ./cluster/operations/secure-internal-postgres-uaa.yml \
-o ./cluster/operations/secure-internal-postgres-credhub.yml \
-o ./backup-postgres.yml
-
Re-deployed Concourse
-
Once Concourse re-deployed, I ran my backup-concourse Job and was able to successfully take a backup of the postgres database containing the ATC, CredHub, and UAA databases:
After taking the backup, I untarred the tgz/tar file and I could see the .sql
files for each database.
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100# ls -l backup/
total 160
drwx------ 1 root root 0 Feb 26 04:50 concourse_20210226T045031Z
-rw-r--r-- 1 root root 163236 Feb 26 04:50 product_concourse_2021-02-26-04-50-54.tgz
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100# cd backup/
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100/backup# tar -xvf product_concourse_2021-02-26-04-50-54.tgz
concourse_20210226T045031Z/db-0-bbr-postgres-db.tar
concourse_20210226T045031Z/manifest.yml
concourse_20210226T045031Z/metadata
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100/backup# ls -l concourse_20210226T045031Z/
total 368
-rw-r--r-- 1 root root 358400 Feb 26 04:50 db-0-bbr-postgres-db.tar
-rw-r--r-- 1 root root 8276 Feb 26 04:50 manifest.yml
-rw-r--r-- 1 root root 460 Feb 26 04:50 metadata
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100/backup# cd concourse_20210226T045031Z/
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100/backup/concourse_20210226T045031Z# tar -xvf db-0-bbr-postgres-db.tar
./
./postgres_uaa.sql
./postgres_credhub.sql
./postgres_atc.sql
root@3333a357-0504-458c-51f0-a1c4dac8a6f1:/tmp/build/c03f8100/backup/concourse_20210226T045031Z# ls -l
total 712
-rw-r--r-- 1 root root 358400 Feb 26 04:50 db-0-bbr-postgres-db.tar
-rw-r--r-- 1 root root 8276 Feb 26 04:50 manifest.yml
-rw-r--r-- 1 root root 460 Feb 26 04:50 metadata
-rw-r--r-- 1 root root 261771 Feb 26 04:50 postgres_atc.sql
-rw-r--r-- 1 root root 30200 Feb 26 04:50 postgres_credhub.sql
-rw-r--r-- 1 root root 53704 Feb 26 04:50 postgres_uaa.sql
Let me know if you have any questions on this workaround. I hope this helps!
After speaking with Bosh team, solution#3 is not feasible as bosh links doesn't support returning IP and DNS hostname at the same time.
So it seems @mjenk664 's potential solution#2 is the best so far. At least you can keep the backup-postgres.yml
for next re-deployment.