Monday, September 4, 2017

Dataguard: Redoroutes for Alternate Destinations Redo data Transport failover

Using the RedoRoutes Property for Remote Alternate Destinations within a Dataguard environments. The RedoRoutes property can be used to set up a remote alternate destination so that a terminal Standby Database ( Standby DB, Cascade Standby DB, FarSync DB) can still receive redo data even if the Standby Database from which it was receiving the redo data fails.

In this example below we have a primary cdb1 who send his redo data to the farsync instance. The farsync cdb1fs send the redo data to the standby database cdb2. See the instructions for building the dataguard environnment "MaxAvailability DataGuard environment : Adding a far sync instance to the environment". Is it possible to configure a setup, when the primary database, cdb1, send the redo data directly to standby database CDB2 in a case that the farsync csb1fs is not available.  And when the FAR SYNC instance cdb1fs is operational again the configuration is operational again as designed.
In Oracle 12.2 we threw away the ALTERNATE attribute of redo transport and replacing this with 2 new attributes -­> GROUP and PRIORITY
  • GROUP      : Says we are all potential alternates of each other in the same group
  • PRIORITY : Tells DataGuard when to fallback to the original target between groups
You can use the DGMGRL command-line interface to configure and implement the enhanced redo destinations capability provided by the GROUP and PRIORITY attributes of the LOG_ARCHIVE_DEST_n parameter (Advanced Redo Transport Settings).  

Setup an alternate destination when farsync instance is failing

Status of the Dataguard

DGMGRL> show configuration 

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Members:
  cdb1   - Primary database
    cdb1fs - Far sync instance 
      cdb2   - Physical standby database 

Fast-Start Failover: DISABLED

DGMGRL> show database verbose cdb1 RedoRoutes
  RedoRoutes = '(cdb1:cdb1fs sync)'

DGMGRL> show far_sync verbose cdb1fs RedoRoutes
  RedoRoutes = '(cdb1:cdb2 async)'

DGMGRL> show database verbose cdb2 RedoRoutes
  RedoRoutes = '(cdb2:cdb1 sync)'

Configure an alternate destination for farsync instance cdb1fs for the cdb1 primary database. It will go directly to standby database cdb2 and in a dataguard  MAX AVAILABILITY DataGuard setting  

DGMGRL> EDIT DATABASE cdb1 SET PROPERTY 'RedoRoutes' = '(cdb1 : ( cdb1fs SYNC PRIORITY=1, cdb2 ASYNC PRIORITY=2 ))';

DGMGRL> show configuration

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Members:
  cdb1   - Primary database
    cdb1fs - Far sync instance 
      cdb2   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 50 seconds ago)

In the current command we don't see the configuration of the redoroutes alternatives. 

DGMGRL> show configuration verbose

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Members:
  cdb1   - Primary database
    cdb1fs - Far sync instance 
      cdb2   - Physical standby database 
    cdb2   - Physical standby database (alternate of cdb1fs)

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'cdb1_CFG'

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Using the verbose attribute shows the alternate configuration setup for the farsync instance cdb1fs for the redo data coming from the primary database cdb1.

Logging information in the alert logs
Looking in the alert logs of the databases we notice the the following "ALTER SYSTEM commands"

Information alert_cdb1.log
==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
ALTER SYSTEM SET log_archive_dest_2='service="cdb1fs"','SYNC AFFIRM delay=0 optional compression=disable max_failure=1 max_connections=1 reopen=300 db_unique_name="cdb1fs" net_timeout=30','group=1 priority=1 valid_for=(online_logfile,all_roles)' SCOPE=BOTH;
ALTER SYSTEM SET log_archive_dest_3='service="cdb2"','ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=1 max_connections=1 reopen=300 db_unique_name="cdb2" net_timeout=30','group=1 priority=2 valid_for=(online_logfile,all_roles)' SCOPE=BOTH;
ALTER SYSTEM SWITCH ALL LOGFILE start (cdb1)

Information alert_cdb1fs.log
==> /home/oracle/app/diag/rdbms/cdb1fs/cdb1fs/trace/alert_cdb1fs.log <==
ALTER SYSTEM SET fal_server='cdb1','cdb2' SCOPE=BOTH;

Information alert_cdb2.log
==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
ALTER SYSTEM SET fal_server='cdb1fs','cdb1' SCOPE=BOTH;

Introducing a failure of the farsync instance cdb1fs. Stop the farsync instance cdb1fs by using the OS command kill -9 <ospid process smon of cdb1fs>.  

Logging information in the alert logs

==> /home/oracle/app/diag/rdbms/cdb1fs/cdb1fs/trace/alert_cdb1fs.log <==
Instance Critical Process (pid: 22, ospid: 2815, SMON) died unexpectedly
PMON (ospid: 2721): terminating the instance due to error 474
System state dump requested by (instance=1, osid=2721 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /home/oracle/app/diag/rdbms/cdb1fs/cdb1fs/trace/cdb1fs_diag_2791_20170830001708.trc

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
RFS[4]: Assigned to RFS process (PID:3022)
RFS[4]: Possible network disconnect with primary database
RFS[2]: Possible network disconnect with primary database
RFS[1]: Possible network disconnect with primary database
RFS[3]: Possible network disconnect with primary database

==> /home/oracle/app/diag/rdbms/cdb1fs/cdb1fs/trace/alert_cdb1fs.log <==
Dumping diagnostic data in directory=[cdmp_20170830001708], requested by (instance=1, osid=2721 (PMON)), summary=[abnormal instance termination].

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3113)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Errors in file /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/cdb1_lgwr_3117.trc:
ORA-03113: end-of-file on communication channel
Error 3113 for archive log file 1 to 'cdb1fs'
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED

==> /home/oracle/app/diag/rdbms/cdb1fs/cdb1fs/trace/alert_cdb1fs.log <==
Instance terminated by PMON, pid = 2721

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
LGWR: Failed to archive log 1 thread 1 sequence 96 (3113)
LGWR: Error 1041 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'cdb1fs'
Destination LOG_ARCHIVE_DEST_2 no longer supports SYNCHRONIZATION
Thread 1 advanced to log sequence 97 (LGWR switch)
  Current log# 2 seq# 97 mem# 0: /home/oracle/app/oradata/CDB1/onlinelog/o1_mf_2_dsts05kj_.log
  Current log# 2 seq# 97 mem# 1: /home/oracle/app/orafra/cdb1/CDB1/onlinelog/o1_mf_2_dsts075w_.log
Archived Log entry 167 added for T-1.S-96 ID 0x3740d1fd LAD:1

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
RFS[5]: Assigned to RFS process (PID:5257)
RFS[5]: Selected log 4 for T-1.S-96 dbid 926465619 branch 952748181

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
ARC2: Standby redo logfile selected for thread 1 sequence 96 for destination LOG_ARCHIVE_DEST_3

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM PERFORMANCE mode
Re-archiving standby log 4 T-1.S-96
RFS[6]: Assigned to RFS process (PID:5259)
RFS[6]: Selected log 5 for T-1.S-97 dbid 926465619 branch 952748181

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
TT04: Standby redo logfile selected for thread 1 sequence 97 for destination LOG_ARCHIVE_DEST_3

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
Expanded controlfile section 11 from 110 to 220 records
Requested to grow by 110 records; added 4 blocks of records
2017-08-30T00:17:11.098683-07:00
Archived Log entry 111 added for T-1.S-96 ID 0x3740d1fd LAD:1

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
ALTER SYSTEM SET log_archive_dest_state_3='ENABLE' SCOPE=MEMORY SID='*';

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
Media Recovery Waiting for thread 1 sequence 97 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 97 Reading mem 0
  Mem# 0: /home/oracle/app/oradata/CDB2/onlinelog/o1_mf_5_dsv0nwny_.log
  Mem# 1: /home/oracle/app/orafra/cdb1/CDB2/onlinelog/o1_mf_5_dsv0nwv0_.log

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
Thread 1 advanced to log sequence 98 (LGWR switch)
  Current log# 3 seq# 98 mem# 0: /home/oracle/app/oradata/CDB1/onlinelog/o1_mf_3_dsts05mj_.log
  Current log# 3 seq# 98 mem# 1: /home/oracle/app/orafra/cdb1/CDB1/onlinelog/o1_mf_3_dsts0758_.log
Archived Log entry 169 added for T-1.S-97 ID 0x3740d1fd LAD:1

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
Archived Log entry 112 added for T-1.S-97 ID 0x3740d1fd LAD:1
Media Recovery Waiting for thread 1 sequence 98
Primary database is in MAXIMUM PERFORMANCE mode
RFS[7]: Assigned to RFS process (PID:5274)
RFS[7]: Selected log 4 for T-1.S-98 dbid 926465619 branch 952748181

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
TT03: Standby redo logfile selected for thread 1 sequence 98 for destination LOG_ARCHIVE_DEST_3

Information DataGuard broker
DGMGRL> show configuration

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Members:
  cdb1   - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

    cdb2   - Physical standby database (alternate of cdb1fs)

  Members Not Receiving Redo:
  cdb1fs - Far sync instance 
    Warning: ORA-1034: ORACLE not available

Fast-Start Failover: DISABLED

Configuration Status:
WARNING   (status updated 44 seconds ago)

Summary of the changes, caused by a failing farsync instance cdb1fs.
The Primary database cdb1 activates the LOG_ARCHIVE_DEST_3 ( the alternate destination) for the redo data delivery to the standby database.
The farsync cdb1fs is not the delivery the redo data any more to the standby database cdb2.
And a dataguard protection mode switch, Primary database is in MAXIMUM PERFORMANCE mode - Changing standby controlfile to MAXIMUM PERFORMANCE mode.
The alternate redo data stream was reconfigured by using ASYNC.

Start the farSync instance cdb1fs
When we active the farsync cdb1fs would the configuration be restored

Start the farSync instance cdb1fs

sqlplus startup mount cdb1fs

By following the alert logs of the databases we see the following happening at database level
==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
alter system switch logfile
Thread 1 cannot allocate new log, sequence 99
Checkpoint not complete
  Current log# 3 seq# 98 mem# 0: /home/oracle/app/oradata/CDB1/onlinelog/o1_mf_3_dsts05mj_.log
  Current log# 3 seq# 98 mem# 1: /home/oracle/app/orafra/cdb1/CDB1/onlinelog/o1_mf_3_dsts0758_.log
Thread 1 advanced to log sequence 99 (LGWR switch)
  Current log# 1 seq# 99 mem# 0: /home/oracle/app/oradata/CDB1/onlinelog/o1_mf_1_dsts05k5_.log
  Current log# 1 seq# 99 mem# 1: /home/oracle/app/orafra/cdb1/CDB1/onlinelog/o1_mf_1_dsts05vz_.log
Archived Log entry 171 added for T-1.S-98 ID 0x3740d1fd LAD:1

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
RFS[7]: Selected log 5 for T-1.S-99 dbid 926465619 branch 952748181

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
TT03: Standby redo logfile selected for thread 1 sequence 99 for destination LOG_ARCHIVE_DEST_3

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==

Archived Log entry 113 added for T-1.S-98 ID 0x3740d1fd LAD:1
Media Recovery Waiting for thread 1 sequence 99 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 99 Reading mem 0
  Mem# 0: /home/oracle/app/oradata/CDB2/onlinelog/o1_mf_5_dsv0nwny_.log
  Mem# 1: /home/oracle/app/orafra/cdb1/CDB2/onlinelog/o1_mf_5_dsv0nwv0_.log

....
....

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1.log <==
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';
LGWR: Standby redo logfile selected to archive thread 1 sequence 109
LGWR: Standby redo logfile selected for thread 1 sequence 109 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 109 (LGWR switch)
  Current log# 2 seq# 109 mem# 0: /home/oracle/app/oradata/CDB1/onlinelog/o1_mf_2_dsts05kj_.log
  Current log# 2 seq# 109 mem# 1: /home/oracle/app/orafra/cdb1/CDB1/onlinelog/o1_mf_2_dsts075w_.log
Archived Log entry 193 added for T-1.S-108 ID 0x3740d1fd LAD:1
LGWR: Standby redo logfile selected to archive thread 1 sequence 110
LGWR: Standby redo logfile selected for thread 1 sequence 110 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 110 (LGWR switch)
  Current log# 3 seq# 110 mem# 0: /home/oracle/app/oradata/CDB1/onlinelog/o1_mf_3_dsts05mj_.log
  Current log# 3 seq# 110 mem# 1: /home/oracle/app/orafra/cdb1/CDB1/onlinelog/o1_mf_3_dsts0758_.log
Archived Log entry 195 added for T-1.S-109 ID 0x3740d1fd LAD:1
LGWR: Standby redo logfile selected to archive thread 1 sequence 111
LGWR: Standby redo logfile selected for thread 1 sequence 111 for destination LOG_ARCHIVE_DEST_2
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary

==> /home/oracle/app/diag/rdbms/cdb1/cdb1/trace/alert_cdb1fs.log <==  
Changing standby controlfile to MAXIMUM PERFORMANCE mode
RFS[2]: Assigned to RFS process (PID:5848)
RFS[2]: Selected log 6 for T-1.S-107 dbid 926465619 branch 952748181
Archived Log entry 21 added for T-1.S-107 ID 0x3740d1fd LAD:1
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';
Changing standby controlfile to MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary

==> /home/oracle/app/diag/rdbms/cdb2/cdb2/trace/alert_cdb2.log <==
2017-08-30T00:27:50.253227-07:00
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
RFS[9]: Assigned to RFS process (PID:5865)
RFS[9]: Selected log 4 for T-1.S-109 dbid 926465619 branch 952748181

Dataguard configuration validation and sqlplus

SQL> select protection_mode,protection_level from v$database;
 
PROTECTION_MODE      PROTECTION_LEVEL
-------------------- --------------------
MAXIMUM AVAILABILITY RESYNCHRONIZATION

DGMGRL> show configuration

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Members:
  cdb1   - Primary database
    Warning: ORA-16629: database reports a different protection level from the protection mode

    cdb1fs - Far sync instance 
      cdb2   - Physical standby database 

Fast-Start Failover: DISABLED
##wait
Configuration Status:
WARNING   (status updated 30 seconds ago)

DGMGRL> show configuration

Configuration - DRSolution

  Protection Mode: MaxAvailability
  Members:
  cdb1   - Primary database
    cdb1fs - Far sync instance 
      cdb2   - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 39 seconds ago)


It works as designed.

  1. The redoroutes attributes PRIORITY and GROUPS can be used to setup alternate redo data streams