Wednesday, December 29, 2021

RMAN, Incarnation, & ME part I: “ORA-19906”: Recovery target incarnation changed during recovery

This image has an empty alt attribute; its file name is image-6.pngIntro

If there is something I honestly always struggled with as a DBA, it’s RMAN incarnation. Both mystical like its name suggests and vital, ensuring absolute consistency of your database and backups. But deep down, you always know it’ll back fire one day out of the shadows. Because that’s where it hides, lurking until a human  error happens as one messes up with archive logs or RMAN retentions. It’s no surprise that my skin starts  crawling whenever I see that word show up in an RMAN Horror error. Today we’ll just explore one of the many cases leading to this ORA-19906 error as it can be anything.   

Use case and Configuration (RMAN Refresh)

  • This didn’t occur during a proper Database restore but rather during a refresh from production.
  • I was asked to replace the client’s legacy procedure to refresh 12c databases from production by a time effective alternative which is why I picked RMAN duplicate method (more popular for creating standby DBs). let’s describe the course of events prior to the incident

  • Configuration:
    Source: Oracle 12c NonCDB with enabled archive logging and backup. OS: Windows server
    Test: Similar setup than the production where databases used to be refreshed using a simple restore
    procedure by-weekly.

2. The steps

  • Drop the old refreshed database to be replaced and keep a copy of he pfile
  • Import the password file from source instance to destination DB and rename it to fit the target instance
  • Add appropriate entry into "tnsnames.ora" to allow connections to target DB from the duplicate server.

    ProDB-source =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = TCP)(HOST = Prodserver)(PORT = 1521))
          (SEND_BUF_SIZE = 65535)
          (RECV_BUF_SIZE = 65535)
          (CONNECT_DATA =    (SERVER = DEDICATED)
            (SERVICE_NAME = ProDB)))

  • Directories for data and logs as some system parameters will be adjusted in RMAN duplicate script.
  • Directories necessary for starting the duplicate database are already there (.i.e audit)
  • Make the backup files from the source database available to the destination server(\\Share\PBKUP$\)
  • Start the duplicate instance in NoMount mode using the pfile copy saved earlier 
  • RMAN Connect to the duplicate database as (AUXILIARY) and source DB as TARGET

    $ set ORACLE_SID=testDB
    SQL> STARTUP NOMOUNT PFILE='**\initTargetDB.ora';
    C:> rman TARGET sys@ProdDB-SOURCE AUXILIARY /   msglog=targetDB_duplicate.log

  • Run Rman duplicate script (make sure the last controlfile backup is older than last archivelog backup)

    @rman_duplicate.rman

    Run {
    DUPLICATE DATABASE TO testDB
    until SCN XXXXXXXXX; # to be checked in prod
    SPFILE # params to adjust
    parameter_value_convert ('ProDB','targetDB')
    set db_file_name_convert='D:\ORACLE\ORADATA\ProDB\','D:\ORACLE\ORADATA\targetDB\'
    'F:\ORACLE\ORADATA\ProDB\TEMP\','F:\ORACLE\ORADATA\targetDB\TEMP\
    set log_file_name_convert='F:\ORACLE\ORADATA\ProDB\','F:\ORACLE\ORADATA\targetDB\'
    set db_name='testDB'
    set db_unique_name='testDB'
    set db_recovery_file_dest_size='xx'
    set sga_target='xx'
    set pga_aggregate_target='xx'
    BACKUP LOCATION '\\Share\PBKUP$\'
    NOFILENAMECHECK;
      }


Error New incarnation created

The duplicate command completes a restore but fails when the recovery fails  

Oracle instance shut down
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/28/2020 12:45:11
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover logfile
'F:\ORACLE\TesttDB\ARCHIVELOG\2020_05_28\O1_MF_1_16380_HDZTLDHN_.ARC'

ORA-00283: recovery session canceled due to errors
ORA-19906: recovery target incarnation changed during recovery

 Alert log message:

alter database recover if needed
start until change 15446103440 using backup controlfile
Media Recovery Start Started logmerger process
Thu May 28 05:13:23 2020   Parallel Media Recovery started with 4 slaves
Setting recovery target incarnation to 3 # What the heck ?????
ORA-279 signalled during: alter database recover if needed
... alter database recover logfile
'F:\ORACLE\testDB\ARCHIVELOG\2020_05_28\O1_MF_1_16380_HDZ05NW3_.ARC'
Thu May 28 05:13:50 2020 Errors with log O1_MF_1_16380_HDZ05NW3_.ARC
Thu May 28 05:13:50 2020   Media Recovery failed with error 19906
Recovery interrupted!
ORA-00283: recovery session canceled due to errors
ORA-19906: recovery target incarnation changed during recovery

Investigation:

== DUPLICATE target

RMAN> list incarnation;
List of Database Incarnations
DB Key  IncKey DB Name  DB ID     STATUS  Reset SCN  Reset Time
------- ------ ------- ---------- ------- ---------- ----------
1       1      ProDB   1322653252 ORPHAN  1          03-NOV-11
2       2      ProDB   1322653252 ORPHAN  994063     18-JAN-13
3       3      ProDB   1322653252 CURRENT 14331907298 25-APR-19 == Ghost incarnation

>>> Incarnation 3 on production that doesn’t exist

== PROD source

RMAN> list incarnation;
List of Database Incarnations
DB Key Inc Key DB Name  DB ID     STATUS  Reset SCN  Reset Time
------ ------- ------- ---------- ------- --------- -----------
1       1      ProDB    1322653252 PARENT  1         03-nov-2011
2       2      ProDB    1322653252 CURRENT 994063    18-jan-2013 == valid Incarnation

Incarnation 3 on production doesn’t exist but is visible in the duplicate Database.
Let’s try to c
heck and clean any leftover in the backup location '\\Share\PBKUP$\'

RMAN> crosscheck  backupset;
Crosschecked 6 objects

There was 13 files in that directory among which 7 where outdated and not even spotted by RMAN crosscheck . Cleaning them and crosschecking didn’t fix the problem so

Real Root cause

  • I was so obsessed with the source backup location that I forgot to read the full path of the archive involved in the error.   
  • In fact, this happened because the FRA directory on the duplicate server wasn’t cleaned before running the RMAN duplicate script. Hence RMAN will implicitly catalog all old files including the Archive logs and add them to the media recovery after the restore is done (triggering new incarnation that actually belonged to the past) .     

'F:\ORACLE\testDB\ARCHIVELOG\2020_05_28\O1_MF_1_16380_HDZ05NW3_.ARC'

Solution I removed all old Archive logs and the duplicate finished with no errors after that.

This image has an empty alt attribute; its file name is image-7.png



Conclusion

The Rman lesson of the day to remember is: "Always clean up the house before moving in..".
That is clean the existing  FRA folders on the duplicate destination before the refresh ;).

PS: You’d be surprise to know that this error popped up for other users a decade ago on RMAN 10g

No comments:

Post a Comment