DG Broker doesn't detect that primary database is down
Hi,
It seems DG Broker is not able to detect that primary database is down, the status is Healthy all the time.
Standby detected that the primary is down:
rfs (PID:2778): Possible network disconnect with primary database [krsv.c:4855]
rfs (PID:2778): while processing B-1171608090.T-1.S-21 [krsv.c:4861]
2024-06-14T10:18:55.090551+00:00
***********************************************************************
Fatal NI connect error 12541, connecting to:
(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=db11)(INSTANCE_NAME=DB11)(CID=(PROGRAM=oracle)(HOST=sinchdb12-dwake)(USER=oracle))(CONNECTION_ID=Gtegq8X5CdTgYxoCAQoNsQ==))(ADDRESS=(PROTOCOL=tcp)(HOST=172.20.114.142)(PORT=1521)))
VERSION INFORMATION:
TNS for Linux: Version 21.0.0.0.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 21.0.0.0.0 - Production
Version 21.3.0.0.0
Time: 14-JUN-2024 10:18:55
Tracing not turned on. Process Id = 2516
Tns error struct:
ns main err code: 12541
TNS-12541: TNS:no listener
ns secondary err code: 12560
nt main err code: 511
TNS-00511: No listener
nt secondary err code: 111
nt OS err code: 0
***********************************************************************
But not DG Broker even though the status of the primary is Pending:
$ date
Fri Jun 14 12:41:20 CEST 2024
$ kubectl -n oracle-database get pods
NAME READY STATUS RESTARTS AGE
db11-nfwnl 0/1 Init:1/2 0 23m
db12-dwake 1/1 Running 0 54m
$ kubectl -n oracle-database get singleinstancedatabase
NAME EDITION STATUS ROLE VERSION CONNECT STR TCPS CONNECT STR OEM EXPRESS URL
db11 Enterprise Pending PRIMARY 21.3.0.0.0 10.1.1.161:32480/DB11 Unavailable https://10.1.1.161:30473/em
db12 Enterprise Healthy PHYSICAL_STANDBY 21.3.0.0.0 10.1.2.200:30739/DB12 Unavailable https://10.1.2.200:30875/em
$ kubectl -n oracle-database get dataguardbroker
NAME PRIMARY STANDBYS PROTECTION MODE CONNECT STR STATUS
dataguardbroker-db DB11 DB12 MaxAvailability 10.1.1.161:31036/DATAGUARD Healthy
$ kubectl -n oracle-database describe dataguardbroker
Name: dataguardbroker-db
Namespace: oracle-database
Labels: <none>
Annotations: <none>
API Version: database.oracle.com/v1alpha1
Kind: DataguardBroker
Metadata:
Creation Timestamp: 2024-06-14T09:55:59Z
Finalizers:
database.oracle.com/dataguardbrokerfinalizer
Generation: 1
Resource Version: 94229431
UID: d5703585-503c-46f6-be34-9a3bdb3a40df
Spec:
Fast Start Fail Over:
Primary Database Ref: db11
Protection Mode: MaxAvailability
Set As Primary Database: DB11
Standby Database Refs:
db12
Status:
Cluster Connect String: dataguardbroker-db.oracle-database:1521/DATAGUARD
External Connect String: 10.1.1.161:31036/DATAGUARD
Primary Database: DB11
Primary Database Ref: db11
Protection Mode: MaxAvailability
Standby Databases: DB12
Status: Healthy
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DG Configuration up to date 45m DataguardBroker
Setup: one primary singleinstancedatabase and one standby singleinstancedatabase, both using image enterprise:21.3.0.0. OraOperator version: 1.1.0.
Best regards, Andreas
Is it possible somehow to failover to the standby? In this case the primary instance is beyond rescue and I can't switchover because the primary is unreachable.
If I had a clone I suppose I could have pointed the application to SID of the clone and then created a new standby and configured DG Broker again. In a disaster recovery situation, I mean. Or another option could be to have multiple replicas of primary.
Best regards, Andreas
@andbos right now we only support manual switchover to the standby when all the database are healthy as since DataguardBroker controller is still in the preview release.
We plan to implement failover we need an database observer, which is roadmap item for the next release and will be implemented in v1.2.0
Hi,
Thanks for the update, appreciated. Roughly when could we expect v1.2.0?
Best regards, Andreas
We are still discussing on the timeline for 1.2.0
P.S - if you want to switchover when the primary is down with the current implementation of the DataguardController. You can exec into the standby database and manually run the DGMGRL command for the switchover.
- Log into the DGMGRL shell
DGMGRL sys@<pwd> - run
SWITCHOVER TO <standby_database_sid>
Hi,
Thanks for the tip. To start with, I tried executing SWITCHOVER TO when the primary was up - it worked:
DGMGRL for Linux: Release 21.0.0.0.0 - Production on Mon Jun 24 09:59:33 2024
Version 21.13.0.0.0
Copyright (c) 1982, 2021, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "DB11"
Connected as SYSDBA.
DGMGRL> show configuration
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
db11 - Primary database
db12 - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 27 seconds ago)
DGMGRL> switchover to db12
2024-06-24T09:59:45.339+00:00
Performing switchover NOW, please wait...
2024-06-24T09:59:45.490+00:00
Operation requires a connection to database "db12"
Connecting ...
Connected to "DB12"
Connected as SYSDBA.
2024-06-24T09:59:45.534+00:00
Continuing with the switchover...
2024-06-24T09:59:52.273+00:00
New primary database "db12" is opening...
2024-06-24T09:59:52.273+00:00
Operation requires start up of instance "DB11" on database "db11"
Starting instance "DB11"...
Connected to an idle instance.
ORACLE instance started.
Connected to "DB11"
Database mounted.
Database opened.
Connected to "DB11"
2024-06-24T10:00:10.370+00:00
Switchover succeeded, new primary is "db12"
2024-06-24T10:00:10.373+00:00
Switchover processing complete, broker ready.
DGMGRL> show configuration
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
db12 - Primary database
db11 - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 18 seconds ago)
However, the DataGuardBroker didn't notice that a switchover was done.
$ date
Mon Jun 24 12:02:09 CEST 2024
$ kubectl --kubeconfig ~/.kube/config-sinch-op-smsf-1-andbos -n oracle-database get dataguardbroker
NAME PRIMARY STANDBYS PROTECTION MODE CONNECT STR STATUS
sidb-dgbroker DB11 DB12 MaxAvailability 10.1.1.161:32495/DATAGUARD Healthy
Best regards, Andreas
But when I restarted the new standby (db11) DataGuardBroker reported status Healthy the whole time despite active ORA errors:
DGMGRL> show configuration
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
db12 - Primary database
Error: ORA-16810: multiple errors or warnings detected for the member
db11 - Physical standby database
Error: ORA-16599: Oracle Data Guard broker detected a stale configuration
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 51 seconds ago)
Not even any events:
$ kubectl -n oracle-database describe dataguardbroker
Name: sidb-dgbroker
Namespace: oracle-database
Labels: <none>
Annotations: <none>
API Version: database.oracle.com/v1alpha1
Kind: DataguardBroker
Metadata:
Creation Timestamp: 2024-06-24T09:47:05Z
Finalizers:
database.oracle.com/dataguardbrokerfinalizer
Generation: 1
Resource Version: 133480469
UID: de3110c0-78e2-401a-9f91-c245b8519273
Spec:
Fast Start Fail Over:
Primary Database Ref: sidb11
Protection Mode: MaxAvailability
Set As Primary Database: DB11
Standby Database Refs:
sidb12
Status:
Cluster Connect String: sidb-dgbroker.oracle-database:1521/DATAGUARD
External Connect String: 10.1.1.161:32495/DATAGUARD
Primary Database: DB11
Primary Database Ref: sidb11
Protection Mode: MaxAvailability
Standby Databases: DB12
Status: Healthy
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DG Configuration up to date 20m DataguardBroker
@andbos that is true the dgbroker would not detect the switchover in this case when we do is manually. This is because since DGBroker controller reconcile has not been triggered.
To detect change in config manually as well we would depend on the database observer which is planned for the next release
Ok, I see. Worse is that if I take down the primary and then attempt switch/failover to the standby the operation won't succeed:
DGMGRL for Linux: Release 21.0.0.0.0 - Production on Mon Jun 24 10:11:09 2024
Version 21.13.0.0.0
Copyright (c) 1982, 2021, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "DB12"
Connected as SYSDBA.
DGMGRL> show configuration
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
db11 - Primary database
db12 - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 60 seconds ago)
DGMGRL> show configuration
Configuration - dg_config
Protection Mode: MaxAvailability
Members:
db11 - Primary database
Error: ORA-12541: TNS:no listener
db12 - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
ERROR (status updated 0 seconds ago)
DGMGRL> failover to db12
ORA-16600: not connected to target standby database
DGMGRL> switchover to db12
2024-06-24T10:14:57.686+00:00
Performing switchover NOW, please wait...
Error: ORA-12541: TNS:no listener
Error: ORA-16625: cannot reach member "db11"
Failed.
2024-06-24T10:14:59.729+00:00
Unable to switchover, primary database is still "db11"
DGMGRL>