oracle-database-operator icon indicating copy to clipboard operation
oracle-database-operator copied to clipboard

Creating Single Instance Database Clone fails

Open andbos opened this issue 1 year ago • 5 comments

Creating Single Instance Database Clone fails with the following error:

[2023:12:05 08:44:20]: Acquiring lock .ORAW.create_lck with heartbeat 30 secs
[2023:12:05 08:44:20]: Lock acquired
[2023:12:05 08:44:20]: Starting heartbeat
[2023:12:05 08:44:20]: Lock held .ORAW.create_lck
ORACLE EDITION: ENTERPRISE
[WARNING] [DBT-11217] Unable to check available shared memory on specified node(s) ([10]).
Prepare for db operation
[FATAL] [DBT-06006] Unable to create directory: (/opt/oracle/oradata/ORAW).
   CAUSE: Proper permissions are not granted to create the directory or there is no space left in the volume.
[ 2023-12-05 08:44:32.566 UTC ] [WARNING] [DBT-11217] Unable to check available shared memory on specified node(s) ([10]).
[ 2023-12-05 08:46:43.404 UTC ] Prepare for db operation
[ 2023-12-05 08:46:43.473 UTC ] [FATAL] [DBT-06006] Unable to create directory: (/opt/oracle/oradata/ORAW).

LSNRCTL for Linux: Version 21.0.0.0.0 - Production on 05-DEC-2023 08:46:43

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused

Configuration file used was based on https://github.com/oracle/oracle-database-operator/blob/main/config/samples/sidb/singleinstancedatabase_clone.yaml, just updated name, namespace, sid and cloneFrom.

Before attempting to create the clone I had one Primary and two Physical standby instances running without issues:

kubectl -n oracle-database get singleinstancedatabase
NAME      EDITION      STATUS    ROLE               VERSION      CONNECT STR             TCPS CONNECT STR   OEM EXPRESS URL
szczyrk   Enterprise   Healthy   PHYSICAL_STANDBY   21.3.0.0.0   10.1.2.46:32533/ORAS    Unavailable        https://10.1.2.46:32428/em
ustron    Enterprise   Healthy   PHYSICAL_STANDBY   21.3.0.0.0   10.1.2.46:31519/ORAU    Unavailable        https://10.1.2.46:31105/em
zywiec    Enterprise   Healthy   PRIMARY            21.3.0.0.0   10.1.2.46:31761/ORAZ    Unavailable        https://10.1.2.46:30398/em

After executing kubectl apply on singleinstancedatabase_clone:

kubectl -n oracle-database get singleinstancedatabase
NAME      EDITION      STATUS     ROLE               VERSION       CONNECT STR             TCPS CONNECT STR   OEM EXPRESS URL
szczyrk   Enterprise   Healthy    PHYSICAL_STANDBY   21.3.0.0.0    10.1.2.46:32407/ORAS    Unavailable        https://10.1.2.46:31656/em
ustron    Enterprise   Healthy    PHYSICAL_STANDBY   21.3.0.0.0    10.1.2.46:32678/ORAU    Unavailable        https://10.1.2.46:31596/em
wisla     Enterprise   Creating   Unavailable        Unavailable   10.1.3.159:32557/ORAW   Unavailable        Unavailable
zywiec    Enterprise   Healthy    PRIMARY            21.3.0.0.0    10.1.1.7:32193/ORAZ     Unavailable        https://10.1.1.7:31245/em

kubectl -n oracle-database get pods
NAME            READY   STATUS             RESTARTS      AGE
szczyrk-wx4zj   1/1     Running            0             161m
ustron-9ifi2    1/1     Running            0             154m
wisla-mtqfs     0/1     CrashLoopBackOff   21 (4m ago)   139m
zywiec-vlf14    1/1     Running            0             3h17m

kubectl -n oracle-database get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
szczyrk   Bound    pvc-ce59ebf8-11b3-4b90-8493-3a9bcd1a8d06   10Gi       RWO            gp2            27m
ustron    Bound    pvc-55696dbb-714a-4c20-8bbb-8ba85314cf8a   10Gi       RWO            gp2            20m
wisla     Bound    pvc-283d044d-74f4-4289-b3b7-4cd769e35bfa   10Gi       RWO            gp2            5m13s
zywiec    Bound    pvc-3f4b83cb-d861-4963-8bae-5d7badf1eca6   10Gi       RWO            gp2            63m

Environment:

  • AWS EKS 1.25
  • StorageClass gp2, provisioner kubernetes.io/aws-ebs
  • oracle-database-operator 1.0.0

andbos avatar Dec 05 '23 11:12 andbos

Hi @andbos you are basically cloning the primary sidb right and not one of the two standby sidbs ? if yes please also share the operator pod logs

IshaanDesai45 avatar Dec 22 '23 11:12 IshaanDesai45

Hi,

Yes, I cloned the primary instance. I tried again, now with only one instance in total (a Primary) prior to the clone operation.

$ kubectl --kubeconfig ~/.kube/config-sinch-op-smsf-1-andbos -n oracle-database get singleinstancedatabase
NAME     EDITION      STATUS    ROLE      VERSION      CONNECT STR             TCPS CONNECT STR   OEM EXPRESS URL
zywiec   Enterprise   Healthy   PRIMARY   21.3.0.0.0   10.1.3.159:31170/ORAZ   Unavailable        https://10.1.3.159:30994/em

$ kubectl --kubeconfig -n oracle-database get pvc
NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
zywiec   Bound    pvc-37535c5d-f6df-489f-b91b-ed3656bd6e8e   50Gi       RWO            gp2            8d

$ kubectl -n oracle-database get pods
NAME           READY   STATUS    RESTARTS   AGE
zywiec-vpxps   1/1     Running   0          8d

$ kubectl -n oracle-database apply -f ~/oracle/singleinstancedatabase_clone.yaml
singleinstancedatabase.database.oracle.com/wisla created

$ kubectl -n oracle-database get pods
NAME           READY   STATUS    RESTARTS      AGE
wisla-wnnrh    0/1     Running   1 (72s ago)   3m54s
zywiec-vpxps   1/1     Running   0             8d

$ kubectl -n oracle-database get pvc
NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
wisla    Bound    pvc-1cec290e-f461-4b66-81fc-e0069a22480b   50Gi       RWO            gp2            3m58s
zywiec   Bound    pvc-37535c5d-f6df-489f-b91b-ed3656bd6e8e   50Gi       RWO            gp2            8d

$ kubectl -n oracle-database get singleinstancedatabase
NAME     EDITION      STATUS     ROLE          VERSION       CONNECT STR             TCPS CONNECT STR   OEM EXPRESS URL
wisla    Enterprise   Creating   Unavailable   Unavailable   10.1.1.7:30200/ORAW     Unavailable        Unavailable
zywiec   Enterprise   Healthy    PRIMARY       21.3.0.0.0    10.1.3.159:31170/ORAZ   Unavailable        https://10.1.3.159:30994/em

$ kubectl -n oracle-database logs -f wisla-wnnrh
Defaulted container "wisla" out of: wisla, init-permissions (init), init-wallet (init)
[2023:12:22 12:39:46]: Acquiring lock .ORAW.create_lck with heartbeat 30 secs
[2023:12:22 12:39:46]: Lock acquired
[2023:12:22 12:39:46]: Starting heartbeat
[2023:12:22 12:39:46]: Lock held .ORAW.create_lck
ORACLE EDITION: ENTERPRISE
[WARNING] [DBT-11217] Unable to check available shared memory on specified node(s) ([10]).
Prepare for db operation
[FATAL] [DBT-06006] Unable to create directory: (/opt/oracle/oradata/ORAW).
   CAUSE: Proper permissions are not granted to create the directory or there is no space left in the volume.
[ 2023-12-22 12:39:57.762 UTC ] [WARNING] [DBT-11217] Unable to check available shared memory on specified node(s) ([10]).
[ 2023-12-22 12:42:07.441 UTC ] Prepare for db operation
[ 2023-12-22 12:42:07.506 UTC ] [FATAL] [DBT-06006] Unable to create directory: (/opt/oracle/oradata/ORAW).

LSNRCTL for Linux: Version 21.0.0.0.0 - Production on 22-DEC-2023 12:42:07

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused
$ date
Fri Dec 22 13:42:27 CET 2023

oracle-database-operator-controller-manager_log.txt singleinstancedatabase_clone.yaml.txt singleinstancedatabase.yaml.txt

I have uploaded the operator-controller-manager pod logs (taken with stern so logs from all three pods are present) my YAML files.

Best regards, Andreas

andbos avatar Dec 22 '23 12:12 andbos

@andbos please check if the /opt/oracle/oradata which is a directory mounted with a volume have enough space, if not then try expanding the volume source or using a new volume.

IshaanDesai45 avatar Jan 30 '24 09:01 IshaanDesai45

Hi,

Thanks. It turns out I had assigned insufficient storage (tried to save money during testing). I doubled the storage to 100Gi and after that the creation of the clone went well. A bit strange that it was possible to create the initial primary with as little at 10Gi but not a clone. What's the minimal amount?

$ kubectl -n oracle-database apply -f ~/oracle/singleinstancedatabase_clone.yaml
singleinstancedatabase.database.oracle.com/wisla created

$ kubectl -n oracle-database get pods
NAME           READY   STATUS    RESTARTS   AGE
wisla-ly4z2    1/1     Running   0          14m
zywiec-q14pe   1/1     Running   0          26m

$ kubectl -n oracle-database get singleinstancedatabase
NAME     EDITION      STATUS    ROLE      VERSION      CONNECT STR            TCPS CONNECT STR   OEM EXPRESS URL
wisla    Enterprise   Healthy   PRIMARY   21.3.0.0.0   10.1.2.46:30531/ORAW   Unavailable        https://10.1.2.46:30474/em
zywiec   Enterprise   Healthy   PRIMARY   21.3.0.0.0   10.1.2.46:32102/ORAZ   Unavailable        https://10.1.2.46:31749/em

From above output it is impossible to know which instance is the original primary and which is the clone. Maybe that could be highlighted somehow in a future release? As far as I understand the clone is like a snapshot or a backup taken at a specific time and there is no syncing, the two primary instances are not clustered.

Best regards, Andreas

andbos avatar Feb 02 '24 09:02 andbos

Aha now I see that you can determine which instance is a clone by checking Clone From under Status and Spec in the describe singleinstance output.

$ kubectl --kubeconfig ~/.kube/config-sinch-op-smsf-1-andbos -n oracle-database describe singleinstancedatabase wisla
Name:         wisla
Namespace:    oracle-database
Labels:       <none>
Annotations:  <none>
API Version:  database.oracle.com/v1alpha1
Kind:         SingleInstanceDatabase
Metadata:
  Creation Timestamp:  2024-02-02T08:57:59Z
  Finalizers:
    database.oracle.com/singleinstancedatabasefinalizer
  Generation:        1
  Resource Version:  156829610
  UID:               3a3ec8cc-302e-4320-aabc-22f2feebc918
Spec:
  Admin Password:
    Keep Secret:  true
    Secret Key:   oracle_pwd
    Secret Name:  db-admin-secret
  Clone From:     zywiec
  Image:
    Pull From:     container-registry.oracle.com/database/enterprise:latest
    Pull Secrets:  oracle-container-registry-secret
  Init Params:
  Pdb Name:  ORCLPDB1
  Persistence:
    Access Mode:    ReadWriteOnce
    Size:           100Gi
    Storage Class:  gp2
  Replicas:         1
  Sid:              ORAW
Status:
  Archive Log:             false
  Clone From:              zywiec

andbos avatar Feb 02 '24 09:02 andbos

@andbos closing this issue as I think the issue was resolved upon expanding the block volume attached to the sidb resource pod.

Feel free to reopen or reach out if problem persists

IshaanDesai45 avatar Jun 18 '24 10:06 IshaanDesai45