backup-container
                                
                                
                                
                                    backup-container copied to clipboard
                            
                            
                            
                        Postgres backup verification error - The server failed to start
First time executing the backup was this morning and I cannot get the verification to work. We do not have a public schema in our database, hopefully it is not required as it was not for the backup. I have attempted the verification several times. Any help in steps that I can take to investigate the cause of this issue is appreciated.
Namespace = 30245e-dev.
sh-4.2$ ./backup.sh -s -v all
Verifying backup ...
Settings:
- Database: postgres=patroni-master-dev/mals
- Backup file: /backups/daily/2021-07-27/patroni-master-dev-mals_2021-07-27_11-12-00.sql.gz
Already clean ...
waiting for server to start.....................................................................................................................
The server failed to start within 0h:2m:0s.
The server started in 0h:2m:0s.
waiting for server to shut down.... done
server stopped
Cleaning up ...
[!!ERROR!!] - Backup verification failed: /backups/daily/2021-07-27/patroni-master-dev-mals_2021-07-27_11-12-00.sql.gz
Elapsed time: 0h:2m:1s - Status Code: 1
                                    
                                    
                                    
                                
In the run above, the verification is failing because the local server instance (used for validation) did not start within the timeout period set for DATABASE_SERVER_TIMEOUT; in this case 2 minutes.  You can increase the time by setting DATABASE_SERVER_TIMEOUT to a higher value, for example;

There are other factors that can cause the server to start slowly such as CPU and Memory resource allocation.  The backup-container performs best when allowed to use "best-effort" resource allocation, which is the default specified in the example template; https://github.com/BCDevOps/backup-container/blob/master/openshift/templates/backup/backup-deploy.yaml#L428-L447
If your database does not use a "public" schema, you can specify the name of your database's schema via the TABLE_SCHEMA variable.  This will tell the backup-container which schema to backup and verify.
https://github.com/BCDevOps/backup-container/blob/master/openshift/templates/backup/backup-deploy.yaml#L293-L297
Thanks Wade. Those four parameter values are set to the defaults and the table_schema parameter is set to the appropriate schema. I added the timeout parameter with 300 seconds and the verification now times out in 5 minutes.
I have updated the Dockerfile to pull Postgres v13 as our app database is v13. Hopefully that isn't the cause of the issue.
FROM centos/postgresql-13-centos7
                                    
                                    
                                    
                                
That could be the issue. We haven't tested the backup-container with that version. It could be that some startup parameters have changed. What's the version of the database you're backing up?
Full version:
 PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
I will redo the backup build using v12 for to see if it is compatible.
It's typically preferable to have the backup-container use the same or newer version as the database you are backing up. Would you be up for troubleshooting the startup issues and contributing the upgrade back to the project? I can provide you with some guidance on how to go about troubleshooting.
Certainly. Keep in mind that this is my first and only experience with OpenShift and I am still learning how everything works/interacts.
Open a terminal on a running version of your backup-container.  The you'll run the following command after replacing the <DatabaseName>, <DatabaseUser> and <DatabasePassword> with some real values:
POSTGRESQL_DATABASE=<DatabaseName> \
POSTGRESQL_USER=<DatabaseUser> \
POSTGRESQL_PASSWORD=<DatabasePassword> \
run-postgresql
For the purpose of troubleshooting it does not really matter what the values are for <DatabaseName>, <DatabaseUser> and <DatabasePassword>.  Running the command will start start the server and you'll be able to see the log output and hopefully find out where it's failing.
You're basically mimicking what's done by this function, https://github.com/BCDevOps/backup-container/blob/master/docker/backup.postgres.plugin#L104-L123, without suppressing the log output.
I'm interested in this as well, using Postgis extension of Postgres 13, and just starting to look at using backup container.
Below is the output;
sh-4.2$ POSTGRESQL_DATABASE=mals \
> POSTGRESQL_USER=postgres \
> POSTGRESQL_PASSWORD=##### \
> run-postgresql
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... PST8PDT
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
    pg_ctl -D /var/lib/pgsql/data/userdata -l logfile start
waiting for server to start....2021-07-28 13:25:13.746 PDT [38077] LOG:  redirecting log output to logging collector process
2021-07-28 13:25:13.746 PDT [38077] HINT:  Future log output will appear in directory "log".
 done
server started
/var/run/postgresql:5432 - accepting connections
createuser: error: creation of new role failed: ERROR:  role "postgres" already exists
                                    
                                    
                                    
                                
I'm interested in this as well, using Postgis extension of Postgres 13, and just starting to look at using backup container.
Once the upgrade is complete, the Postgis extension would need to be installed on the backup-container for your use case.  It should be a simple update to the dockerfile.
@MScallion, Other then the createuser: error: creation of new role failed: ERROR:  role "postgres" already exists, it looks like it's running.  So you're issue may be due to the way the container is configured.
What's the output from ./backup.sh -c?
Settings:
- Run mode: cron
- Backup strategy: rolling
- Current backup type: daily
- Backups to retain:
  - Daily: 1
  - Weekly: 1
  - Monthly: 1
- Current backup folder: /backups/daily/2021-07-28/
- Time Zone: PDT -0700
- Schedule:
  - 0 1 * * * default ./backup.sh -s
  - 0 4 * * * default ./backup.sh -s -v all
- Container Type: postgres
- Databases (filtered by container type):
  - postgres=patroni-master-dev/mals
- FTP server: not configured
- Webhook Endpoint: https://parcsystems.webhook.office.com/webhookb2/14b3e8f2-8035-4b3b-bf09-82f0b981616e@0d413f05-310c-4aad-8859-8943f3ded1f0/IncomingWebhook/8ac18ec5da754f5dbd7701dbb173c63e/9d0970f2-adfc-4c59-b638-053c71980087
- Environment Friendly Name: MALS
- Environment Name (Id): dev
                                    
                                    
                                    
                                
Should the Environment Name be the namespace?
Should the Environment Name be the namespace?
Yes, but that shouldn't effect the verification.
I accepted all of the default values for the build. Below is our param file for the deployment.
Side note, the webhook did not produce notifications in Teams, as desired.
APP_NAME                              =mals-bkup
BACKUP_VOLUME_NAME                    =backup-mals
DAILY_BACKUPS                         ="1"
DATABASE_DEPLOYMENT_NAME              =patroni-dev
DATABASE_NAME                         ="mals"
DATABASE_PASSWORD_KEY_NAME            =app-db-owner-password
DATABASE_USER_KEY_NAME                =app-db-owner-username
ENVIRONMENT_NAME                      ="dev"
ENVIRONMENT_FRIENDLY_NAME             =""
IMAGE_NAMESPACE                       =30245e-tools
MONTHLY_BACKUPS                       ="1"
NAMESPACE_NAME                        ="30245e"
SOURCE_IMAGE_NAME                     =mals-bkup
TABLE_SCHEMA                          =mals_app
TAG_NAME                              =v1
WEBHOOK_URL                           ="https://parcsystems.webhook.office.com/webhookb2/14b3e8f2-8035-4b3b-bf09-82f0b981616e@0d413f05-310c-4aad-8859-8943f3ded1f0/IncomingWebhook/8ac18ec5da754f5dbd7701dbb173c63e/9d0970f2-adfc-4c59-b638-053c71980087"
WEEKLY_BACKUPS                        ="1"
#
#
#
# Parameter Defaults
#
#
# BACKUP_DIR                            =/backups/
# BACKUP_PERIOD                         =""
# BACKUP_STRATEGY                       =rolling
# BACKUP_VOLUME_CLASS                   =netapp-file-backup
# BACKUP_VOLUME_SIZE                    =5Gi
# CONFIG_FILE_NAME                      =backup.conf
# CONFIG_MAP_NAME                       =backup-conf
# CONFIG_MOUNT_PATH                     =/
# CPU_LIMIT                             ="0"
# CPU_REQUEST                           ="0"
# DATABASE_SERVICE_NAME                 ="" 
# FTP_PASSWORD                          =""
# FTP_SECRET_KEY                        =ftp-secret
# FTP_URL                               =""
# FTP_URL_HOST                          =
# FTP_USER                              =""
# MEMORY_LIMIT                          =0Mi
# MEMORY_REQUEST                        =0Mi
# MONGODB_AUTHENTICATION_DATABASE       =""
# MSSQL_SA_PASSWORD                     =
# NAME                                  =backup-postgres
# NUM_BACKUPS                           =""#                                    
# ROLE                                  =backup-container#                                    
# VERIFICATION_VOLUME_CLASS             =netapp-file-standard
# VERIFICATION_VOLUME_MOUNT_PATH        =/var/lib/pgsql/data
# VERIFICATION_VOLUME_NAME              =backup-verification
# VERIFICATION_VOLUME_SIZE              =1Gi#                                    
# WEBHOOK_URL_HOST                      =  
                                    
                                    
                                    
                                
When I updated the app database to v13 I did not contribute the changes because I split the database user into a DB owner and an application proxy user. I understood that this could not be used because it was inconsistent with previous templates. Hopefully this is not an issue for the verification as the backup appears to have accepted the user/password that I provided.
@MScallion, You are missing environment variables for PATRONI_MASTER_DEV_USER and PATRONI_MASTER_DEV_PASSWORD, as defined here; https://github.com/BCDevOps/backup-container#backupconf
Examples: DC - https://github.com/bcgov/orgbook-configurations/blob/master/openshift/templates/backup/backup-deploy.yaml#L128-L147 backup.conf - https://github.com/bcgov/orgbook-configurations/blob/master/openshift/templates/backup/config/backup.conf#L39-L40
That would be causing these lookups to return blank strings and cause the startup to fail; https://github.com/BCDevOps/backup-container/blob/master/docker/backup.postgres.plugin#L119-L120
When I updated the app database to v13 I did not contribute the changes because I split the database user into a DB owner and an application proxy user. I understood that this could not be used because it was inconsistent with previous templates. Hopefully this is not an issue for the verification as the backup appears to have accepted the user/password that I provided.
Are you saying you've modified the code, or just configured the container in a very specific way?
Thanks Wade, I don't see those in the backup-container repository, which is the repo that I cloned. I added the four parameters to the deployment template. I am not sure how to update the config file. Currently I have defined the database as;
postgres=patroni-master-dev/mals
Should I add the two lines as-is? ie are these values to be set the same for all projects or should I update them to reflect project specific components?
 db-bc:5432/aries-vcr
 wallet-bc:5432/agent_bc_wallet
                                    
                                    
                                    
                                
Those values are a customization that need to be done for each application of the container when you use the backup.conf file to define the backups and schedule.  I was just providing examples of the customization done for another project.
The examples were to show:
DB_BC_USER, and DB_BC_PASSWORD environment variables are associated with the db-bc:5432/aries-vcr entry in the backup.conf file, and WALLET_BC_USER and WALLET_BC_PASSWORD environment variables are associated with the wallet-bc:5432/agent_bc_wallet entry in the backup.conf file.  They map the credentials for the databases, via naming convention for the scripts to pick up.
Are you saying you've modified the code, or just configured the container in a very specific way?
I replaced the APP_USER/APP_PASSWORD secret values with app-db-owner-username/app-db-owner-password and app-proxy-username/app-proxy-password in order to configure a super user, DB owner and an application user, in an attempt to be in compliance with the principle of least privileges. I then administered the necessary role and grants for the app user.
I also updated the Dockerfile to references:
FROM postgres:13
ENV PATRONI_VERSION=2.0.1
and to include grants that helped to resolve an error early on, possibly as a result of the changes to the users;
RUN chmod a+x /usr/share/scripts/patroni/health_check.sh
RUN chmod a+x /usr/share/scripts/patroni/post_init.sh
                                    
                                    
                                    
                                
So for our application database we have used the following parameters in the deployment template;
            - name: DATABASE_USER
              valueFrom:
                secretKeyRef:
                  name: ${DATABASE_DEPLOYMENT_NAME}
                  key: ${DATABASE_USER_KEY_NAME}
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: ${DATABASE_DEPLOYMENT_NAME}
                  key: ${DATABASE_PASSWORD_KEY_NAME}
For which I have provided the following values;
DATABASE_DEPLOYMENT_NAME              =patroni-dev (Name of the Secret)
DATABASE_NAME                         ="mals"
DATABASE_PASSWORD_KEY_NAME            =app-db-owner-password
DATABASE_USER_KEY_NAME                =app-db-owner-username
                                    
                                    
                                    
                                
So for our application database we have used the following parameters in the deployment template;
- name: DATABASE_USER valueFrom: secretKeyRef: name: ${DATABASE_DEPLOYMENT_NAME} key: ${DATABASE_USER_KEY_NAME} - name: DATABASE_PASSWORD valueFrom: secretKeyRef: name: ${DATABASE_DEPLOYMENT_NAME} key: ${DATABASE_PASSWORD_KEY_NAME}For which I have provided the following values;
DATABASE_DEPLOYMENT_NAME =patroni-dev (Name of the Secret) DATABASE_NAME ="mals" DATABASE_PASSWORD_KEY_NAME =app-db-owner-password DATABASE_USER_KEY_NAME =app-db-owner-username
Those are the default ones which are used for the legacy mode of operation only.  When using the newer cron mode of operation with the backup.conf file you need to update your deployment configuration with the additional "credential" variables as laid out in the documentation here, https://github.com/BCDevOps/backup-container#backupconf.  The examples I provided follow those conventions.
Understood, I will work to add those then post my results.
I had assumed that because the backups execute successfully that the credentials might not be the root cause of the issue. The backups successfully complete in ~2 seconds, condensing the 66mb DB storage into 13mb in the backup file. I just updated the secret username and password pair to use the postgres user and the issue persisted,
The readme states that the parameter names are prefixed with the service name. Does it have to be exact? If so then I will likely need to alter our service names as they currently include the environment, ie patroni-master-dev.
I am hoping that I can use PATRONI_MASTER_USER, PATRONI_MASTER_PASSWORD for all environments, without having to modify the existing service names.
The verification again timed out in 5m after I added those parameters to the deployment template and redeployed;
PATRONI_MASTER_DEV_USER
PATRONI_MASTER_DEV_PASSWORD
                                    
                                    
                                    
                                
@WadeBarnes, I have cloned/updated the orgbook-containers repo and I am able to run both the backup and validation. I am now receiving a validation error stating that the application role, to which I grant all object privileges, does not exist. The grants are required because I have split the user into a DB owner and application proxy user. I am investigating alternate dumpfile options, ie pg_dumpall and/or the -x option, to find a best fit for my current configuration and restore capabilities.