docker-images icon indicating copy to clipboard operation
docker-images copied to clipboard

Oracle 19.3.0 not getting deployed on AKS cluster because of Readiness probe failure

Open JithuPaul24 opened this issue 3 years ago • 3 comments

What is happening?

I'm trying to deploy Oracle 19.3.0 on AKs environment with persistence and when helm is deploying the chart to the AKS cluster, pod status is running, but it gets stuck at the Readiness probe inside the /opt/oracle/checkDBStatus.sh.

C:\Users\paju\Desktop\Oracle_db_helm>kubectl get pods NAME READY STATUS RESTARTS AGE oracle19c-oracle-db-5c59477cdb-br2m2 0/1 Running 0 23h

C:\Users\paju\Desktop\Oracle_db_helm>kubectl describe pods oracle19c-oracle-db-5c59477cdb-br2m2 Name: oracle19c-oracle-db-5c59477cdb-br2m2 Namespace: habs-namespace Priority: 0 Node: aks-xxxx-27628362-vmss000002/10.95.2.132 Start Time: Thu, 13 Jan 2022 11:07:01 -0500 Labels: app=oracle19c-oracle-db pod-template-hash=5c59477cdb Annotations: Status: Running IP: 10.95.2.155 IPs: IP: 10.95.2.155 Controlled By: ReplicaSet/oracle19c-oracle-db-5c59477cdb Containers: oracle-db: Container ID: containerd://32d84ff46a2d3ed77b05aedd07dce79a95ba91073005ca70eb6103865e881b15 Image: container-registry.oracle.com/database/enterprise:19.3.0.0 Image ID: container-registry.oracle.com/database/enterprise@sha256:b1b3616864050409f95211d0c8196e1026c95fcd5e3e42985b1378f92394b4a7 Ports: 1521/TCP, 5500/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Thu, 13 Jan 2022 11:12:36 -0500 Ready: False Restart Count: 0 Readiness: exec [/bin/sh -c if [ -f $ORACLE_BASE/checkDBLockStatus.sh ]; then $ORACLE_BASE/checkDBLockStatus.sh ; else $ORACLE_BASE/checkDBStatus.sh; fi ] delay=20s timeout=20s period=40s #success=1 #failure=3 Environment: SVC_HOST: oracle19c-oracle-db SVC_PORT: 1521 ORACLE_SID: ORCLCDB ORACLE_PDB: ORCLPDB1 ORACLE_PWD: <set to the key 'oracle_pwd' in secret 'oracle19c-oracle-db'> Optional: false ORACLE_CHARACTERSET: AL32UTF8 ORACLE_EDITION: enterprise ENABLE_ARCHIVELOG: false Mounts: /opt/oracle/oradata from datamount (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mw4cq (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: datamount: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: oracle19c-oracle-db ReadOnly: false kube-api-access-mw4cq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning Unhealthy 20s (x3115 over 22h) kubelet (combined from similar events): Readiness probe failed: [2022:01:14 15:07:41]: Connecting to the lock process /tmp/.ORCLCDB.exist_lck [2022:01:14 15:07:41]: Lock held .ORCLCDB.exist_lck ORACLE_HOME = [/home/oracle] ? ORACLE_BASE environment variable is not being set since this information is not available for the current user ID . You can set ORACLE_BASE manually if it is required. Resetting ORACLE_BASE to its previous value or ORACLE_HOME The Oracle base remains unchanged with value /opt/oracle /opt/oracle/checkDBStatus.sh: line 26: sqlplus: command not found

What to expect?

I expect pod to be up and running correctly and create a PDB db based on the value I have provided inside the values.yaml file.

How to reproduce?

C:\Users\xxx\Desktop\Oracle_db_helm>helm version version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.16.9"}

C:\Users\xxx\Desktop\Oracle_db_helm>kubectl version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"windows/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-18T19:30:35Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Storage class can be found here

{ "kind": "StorageClass", "apiVersion": "storage.k8s.io/v1", "metadata": { "name": "my-azurefile-jp", "uid": "xxxxx", "resourceVersion": "3980480", "creationTimestamp": "2022-01-07T16:24:50Z", "managedFields": [ { "manager": "kubectl-replace", "operation": "Update", "apiVersion": "storage.k8s.io/v1", "time": "2022-01-07T16:24:50Z", "fieldsType": "FieldsV1", "fieldsV1": { "f:mountOptions": {}, "f:parameters": { ".": {}, "f:location": {}, "f:skuName": {}, "f:storageAccount": {} }, "f:provisioner": {}, "f:reclaimPolicy": {}, "f:volumeBindingMode": {} } } ] }, "provisioner": "kubernetes.io/azure-file", "parameters": { "location": "Canada Central", "skuName": "Standard_LRS", "storageAccount": "xxx" }, "reclaimPolicy": "Delete", "mountOptions": [ "dir_mode=0777", "file_mode=0777", "mfsymlinks", "cache=strict", "actimeo=30" ], "volumeBindingMode": "Immediate" }

Values.yaml file is here

##This parameter changes the ORACLE_SID of the database. The default value is set to ORCLCDB. oracle_sid: ORCLCDB

##This parameter modifies the name of the PDB. The default value is set to ORCLPDB1. oracle_pdb: ORCLPDB1

##The Oracle Database SYS, SYSTEM and PDB_ADMIN password. Defaults to a randomly generated password oracle_pwd:

##The character set to use when creating the database. Defaults to AL32UTF8 oracle_characterset: AL32UTF8

##The database edition (default: enterprise) oracle_edition: enterprise

##Enable archive log mode when creating the database enable_archivelog: false

##Enable persistence using Persistent Volume Claims ##ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ persistence: ##Oracle database data Persistent Volume Storage Class, nfs or block storageClass: "my-azurefile-jp" size: 30Gi

##Deploy only on nodes in a particular availability domain, eg PHX-AD-1 on OCI ##Leave empty if there is no such requirement availabilityDomain:

##Deploy multiple replicas for fast fail over replicas: 1

##deploy LoadBalancer service loadBalService: false

##name of image image: container-registry.oracle.com/database/enterprise:19.3.0.0

##image pull policy, IfNotPresent or Always imagePullPolicy: Always

##container registry login/password imagePullSecrets: regcred

##Deploy only on nodes having required labels . ##Format label_name : label_value . eg pool: sidb ##Leave empty if there is no such requirement nodeLabels: ##agentpool: apppool

Please let me know if you need anything more for debugging

JithuPaul24 avatar Jan 14 '22 15:01 JithuPaul24

[2022:01:13 16:12:37]: Acquiring lock .ORCLCDB.create_lck with heartbeat 30 secs
[2022:01:13 16:12:37]: Lock acquired
[2022:01:13 16:12:37]: Starting heartbeat
[2022:01:13 16:12:37]: Lock held .ORCLCDB.create_lck
ORACLE EDITION: ENTERPRISE

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 13-JAN-2022 16:12:37

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Starting /opt/oracle/product/19c/dbhome_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 19.0.0.0.0 - Production
System parameter file is /opt/oracle/product/19c/dbhome_1/network/admin/listener.ora
Log messages written to /opt/oracle/diag/tnslsnr/oracle19c-oracle-db-5c59477cdb-br2m2/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                13-JAN-2022 16:12:37
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/19c/dbhome_1/network/admin/listener.ora
Listener Log File         /opt/oracle/diag/tnslsnr/oracle19c-oracle-db-5c59477cdb-br2m2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
The listener supports no services
The command completed successfully
Prepare for db operation
8% complete
Copying database files
31% complete
100% complete
[FATAL] Recovery Manager failed to restore datafiles. Refer logs for details.
8% complete
0% complete
Look at the log file "/opt/oracle/cfgtoollogs/dbca/ORCLCDB/ORCLCDB.log" for further details.
[ 2022-01-13 16:12:46.362 UTC ] Prepare for db operation
DBCA_PROGRESS : 8%
[ 2022-01-13 16:12:46.572 UTC ] Copying database files
DBCA_PROGRESS : 31%
DBCA_PROGRESS : 100%
[ 2022-01-13 16:14:30.087 UTC ] [FATAL] Recovery Manager failed to restore datafiles. Refer logs for details.
DBCA_PROGRESS : 8%
DBCA_PROGRESS : 0%

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Jan 13 16:14:30 2022
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL>    ALTER SYSTEM SET control_files='/opt/oracle/oradata/ORCLCDB/control01.ctl' scope=spfile
*
ERROR at line 1:
ORA-32001: write to SPFILE requested but no SPFILE is in use


SQL>
System altered.

SQL>    ALTER PLUGGABLE DATABASE ORCLPDB1 SAVE STATE
*
ERROR at line 1:
ORA-01109: database not open


SQL> BEGIN DBMS_XDB_CONFIG.SETGLOBALPORTENABLED (TRUE); END;

      *
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00201: identifier 'DBMS_XDB_CONFIG.SETGLOBALPORTENABLED' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored


SQL> SQL>
Session altered.

SQL>    CREATE USER OPS$oracle IDENTIFIED EXTERNALLY
*
ERROR at line 1:
ORA-01109: database not open


SQL>    GRANT CREATE SESSION TO OPS$oracle
*
ERROR at line 1:
ORA-01109: database not open


SQL>    GRANT SELECT ON sys.v_$pdbs TO OPS$oracle
*
ERROR at line 1:
ORA-01109: database not open


SQL>    GRANT SELECT ON sys.v_$database TO OPS$oracle
*
ERROR at line 1:
ORA-01109: database not open


SQL>    ALTER USER OPS$oracle SET container_data=all for sys.v_$pdbs container = current
*
ERROR at line 1:
ORA-01109: database not open


SQL> SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
ORACLE_HOME = [/home/oracle] ? ORACLE_BASE environment variable is not being set since this
information is not available for the current user ID .
You can set ORACLE_BASE manually if it is required.
Resetting ORACLE_BASE to its previous value or ORACLE_HOME
The Oracle base remains unchanged with value /opt/oracle
/opt/oracle/checkDBStatus.sh: line 26: sqlplus: command not found
mv: cannot stat '/opt/oracle/product/19c/dbhome_1/dbs/spfileORCLCDB.ora': No such file or directory
mv: cannot stat '/opt/oracle/product/19c/dbhome_1/dbs/orapwORCLCDB': No such file or directory
mv: preserving times for '/opt/oracle/oradata/dbconfig/ORCLCDB/sqlnet.ora': Operation not permitted
mv: preserving permissions for '/opt/oracle/oradata/dbconfig/ORCLCDB/sqlnet.ora': Operation not permitted
mv: preserving times for '/opt/oracle/oradata/dbconfig/ORCLCDB/listener.ora': Operation not permitted
mv: preserving permissions for '/opt/oracle/oradata/dbconfig/ORCLCDB/listener.ora': Operation not permitted
mv: preserving times for '/opt/oracle/oradata/dbconfig/ORCLCDB/tnsnames.ora': Operation not permitted
mv: preserving permissions for '/opt/oracle/oradata/dbconfig/ORCLCDB/tnsnames.ora': Operation not permitted
mv: preserving times for '/opt/oracle/oradata/dbconfig/ORCLCDB/.docker_enterprise': Operation not permitted
mv: preserving permissions for '/opt/oracle/oradata/dbconfig/ORCLCDB/.docker_enterprise': Operation not permitted

Executing user defined scripts
/opt/oracle/runUserScripts.sh: running /opt/oracle/scripts/extensions/setup/savePatchSummary.sh

/opt/oracle/runUserScripts.sh: running /opt/oracle/scripts/extensions/setup/swapLocks.sh
[2022:01:13 16:14:34]: Releasing lock .ORCLCDB.create_lck
[2022:01:13 16:14:34]: Lock released .ORCLCDB.create_lck
[2022:01:13 16:14:34]: Acquiring lock .ORCLCDB.exist_lck with heartbeat 30 secs
[2022:01:13 16:14:34]: Lock acquired
[2022:01:13 16:14:34]: Starting heartbeat
[2022:01:13 16:14:34]: Lock held .ORCLCDB.exist_lck

DONE: Executing user defined scripts

ORACLE_HOME = [/home/oracle] ? ORACLE_BASE environment variable is not being set since this
information is not available for the current user ID .
You can set ORACLE_BASE manually if it is required.
Resetting ORACLE_BASE to its previous value or ORACLE_HOME
The Oracle base remains unchanged with value /opt/oracle
/opt/oracle/checkDBStatus.sh: line 26: sqlplus: command not found
#####################################
########### E R R O R ###############
DATABASE SETUP WAS NOT SUCCESSFUL!
Please check output for further info!
########### E R R O R ###############
#####################################
The following output is now a tail of the alert.log:
2022-01-13T16:13:12.305142+00:00
Successful mount of redo thread 1, with mount id 127634132
2022-01-13T16:13:12.305724+00:00
Database mounted in Exclusive Mode
Lost write protection disabled
.... (PID:590): Using STANDBY_ARCHIVE_DEST parameter default value as /opt/oracle/product/19c/dbhome_1/dbs/arch [krsd.c:18141]
Create Relation IPS_PACKAGE_UNPACK_HISTORY
Completed: ALTER DATABASE   MOUNT
2022-01-13T16:14:30.810539+00:00
ALTER SYSTEM SET local_listener='' SCOPE=MEMORY;

JithuPaul24 avatar Jan 14 '22 20:01 JithuPaul24

@JithuPaul24 Please try with azure disk storage. With azure files, even if the persistent volume has 0777 permissions, the ownership of the files and directories is given to the root user (even if some other user i.e. oracle create them). This cause problems in setting up the database.

abhisbyk avatar Mar 10 '22 04:03 abhisbyk

@JithuPaul24 Any update ??

abhisbyk avatar Apr 27 '22 07:04 abhisbyk

@abhisbyk Sorry to resurrect this, but I can provide the information required on this one as I am having the exact same problem. The Helm chart in the helm-charts directory specifies that the persistent volume claim be for ReadWriteMany storage, and is hard coded in the template. managed-csi (the storageClass for disks in Azure) cannot provide ReadWriteMany, it can only provide ReadWriteOnce. The only storage that can provide ReadWriteMany is azurefile-csi.

Edit: If necessary I can open another Issue with this, but it is the exact same problem. The chart has not been updated since December 2021, shortly before this Issue was created.

kaorihinata avatar Feb 08 '23 04:02 kaorihinata

This issue still exists. I tried the method which @abhisbyk suggested. Still didn't work for me. I'm not using persistence now since mine was a QA database and could live without persistent storage

JithuPaul24 avatar Feb 08 '23 21:02 JithuPaul24

@JithuPaul24 You can successfully deploy this chart by setting the storageClass to managed-csi, modifying the chart template to set the accessMode to ReadWriteOnce (in the persistent volume claim template), and ensuring that you don't create more than one replica, but that definitely enters into "you're on your own" territory. Ideally this chart should be more flexible than it currently is.

kaorihinata avatar Feb 08 '23 21:02 kaorihinata