Switching to a Physical Standby

Now that I’m back to doing more normal DBA work (see here for the reasons behind that), I’ll be posting some more how-to type posts from a DBA perspective, not necessarily an EM perspective. While I’m hoping these posts will be of interest to others, I also want to use them to keep track of just how I did different things so I can rinse and repeat as necessary. 😊 A special shout-out to John Evans, a colleague of mine at Red Stack Tech, who wrote the original document this post was based on.

This first post will be covering the process of performing a switchover from a primary database to a physical standby database. In this particular case, the two environments are running on Exadata machines (the first time I’ve really done anything with Exadata environments other than just monitoring them, so that’s cool!), with the RAC databases running on ASM. Obviously, this work was done for a particular customer, so your environment may vary from this one so please be prepared to take that into account (for example, the database version in this environment is 12.1.0.2). And equally obviously, no customer details will be shared in this post, so specifics like host names, database names and so forth will be deliberately obfuscated. In our initial configuration, DB1 is the primary database and DB2 is the physical standby database.

In the sample outputs below, commands entered on the primary are coloured red, while those on the standby are coloured blue. Commands entered on both are simply left black.

Initial Preparation

There’s a bunch of preparation work that needs to be done before the switchover process actually take place, to ensure the system is ready for switchover. Much of this can be done while the databases are up and running, and the apply process is still switched on.

Reviewing the Alert Logs

The first thing you should do is review the alert logs on both the primary and physical standby to see if there is any recent activity that might indicate a possible problem. There could be a number of errors that are problematic so you’ll need to use your best judgement on this one. If you are unsure where the trace directory is located, a simple SELECT statement can find it for you:

SQL> SELECT value FROM v$diag_info WHERE name = 'Diag Trace';

Checking Services

Make a note of which services run on the primary and standby sites. Since these are Exadata machines, everything has been configured in CRS to start depending on the database role. However, it’s worthwhile noting this so you can compare the services after the switchover and make sure they are running correctly (and restart them if not).
NOTE: In this environment, we need the DB_UNIQUE_NAME to be passed to the srvctl commands. You can find that by using the command:

SQL> SELECT name, db_unique_name, open_mode FROM v$database;

Check the services on the primary:

$ srvctl status service -d DB1
Service service01 is running on instance(s) INSTANCE1,INSTANCE2
Service service02 is not running.
Service service03 is running on instance(s) INSTANCE1,INSTANCE2

Now check them on the standby:

$ srvctl status service -d DB2
Service service01 is not running.
Service service02 is running on instance(s) INSTANCE1,INSTANCE2
Service service03 is not running.

Checking TEMP and Datafiles

The temporary files should match on both sides (i.e. they should exist).

SQL> set lines 180 pages 2000
SQL> col tmpfile for a60
SQL> SELECT tmp.name tmpfile, bytes/1024/1024/1024 GB,
  2         ts.name tablespace_name
  3  FROM gv$tempfile tmp, gv$tablespace ts
  4  WHERE tmp.ts# = ts.ts# AND tmp.inst_id = ts.inst_id
  5  GROUP BY tmp.name,ts.name, bytes;

If you find temp files on the primary that are not on the standby, the tablespaces will already be on the standby so you can just add tempfiles of the right size. On the standby:

SQL> CREATE TEMPORARY TABLESPACE temp_tbspace TEMPFILE '+DATA01' SIZE 1G;
create temporary tablespace temp_tbspace tempfile '+DATA01' size 1G
*
ERROR at line 1:
ORA-01543: tablespace 'TEMP_TBSPACE' already exists

SQL> SELECT tablespace_name FROM dba_tablespaces WHERE tablespace_name LIKE '%TEMP%';
TABLESPACE_NAME
------------------------------
TEMP
TEMP_TBSPACE

SQL> ALTER TABLESPACE temp_tbspace ADD TEMPFILE'+DATA01' SIZE 1g;

Also, no data files should be offline on either the primary or the standby (check on both):

SQL> SELECT tablespace_name, file_name FROM dba_data_files 
  2  WHERE online_status NOT IN ('SYSTEM','ONLINE');
no rows selected

Checking Flashback

If flashback is not turned on, recovering from a failed role transition can be difficult and messy. It is recommended to create restore points to allow faster recovery in the event of a failed role transition (this will be done just before switchover).

Check to see if flashback is enabled and how far back it currently reaches:

SQL> ALTER SESSION SET nls_date_format = 'DD-MON- RRRR HH24:MI:SS';
Session altered.

SQL> SELECT flashback_on FROM v$database;
FLASHBACK_ON
------------------
YES

SQL> SELECT oldest_flashback_time FROM v$flashback_database_log;
OLDEST_FLASHBACK_TIME
---------------------
29-JUL-2017 22:57:58

Checking LOG_ARCHIVE_DEST

In this environment, LOG_ARCHIVE_DEST_2 is the one that is used for sending archived logs to the standby:

SQL> show parameter log_archive_dest_2

NAME                       TYPE        VALUE
-------------------------- ----------- ------------------------------
log_archive_dest_2         string      service="service_stdby", ASYNC
                                       NOAFFIRM delay=0 optional comp
                                       ression=disable max_failure=0
                                       max_connections=1 reopen=300 d
                                       b_unique_name="DB2" n
                                       et_timeout=30, valid_for=(onli
                                       ne_logfile,all_roles)

Checking SYS Passwords

Check that you can connect from the primary to the standby (and vice versa) using the SYS account. On the primary:

$ sqlplus sys@DB2 as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 2 00:10:49 2017
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Enter password:
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
SQL>

And on the standby:

$ sqlplus sys@DB1 as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 2 00:10:49 2017
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Enter password:
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
SQL>

Checking the Password Files

Check where the password files and that they are identical. Firstly, on the primary:

$srvctl config database -d DB1
Database unique name: DB1_UNQ
Database name: DB1
Oracle home: /u01/app/oracle/product/12.1.0.2/db_1
Oracle user: oracle
Spfile: +DATA01/DB1/PARAMETERFILE/spfile.1053.920885823
Password file: +DATA01/DB1/PASSWORD/pwdprimary.1041.920885167
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA01,RECO01
Mount point paths:
Services: service01,service02,service03
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: DB11,DB12
Configured nodes: host01,host02
Database is administrator managed

Since the password file is in ASM, copy it out to a filesystem:

$ . oraenv
ORACLE_SID = [DB11] ? +ASM1
The Oracle base remains unchanged with value /u01/app/oracle
$ asmcmd
ASMCMD> pwcopy +DATA01/DB1_UNQ/PASSWORD/pwdDB1.1041.920885167 /tmp/pwfile_check
copying +DATA01/DB1_UNQ/PASSWORD/pwdDB1.1041.920885167167 -> /tmp/pwfile_check
ASMCMD> exit
$ md5sum /tmp/pwfile_check
f8419fc807c1f3c1dc2f99d01031acb8 /tmp/pwfile_check

Now do the same on the standby and compare the MD5 checksum values:

$ srvctl config database -d DB2
Database unique name: DB2_UNQ
Database name: DB2
Oracle home: /u01/app/oracle/product/12.1.0.2/db_1
Oracle user: oracle
Spfile: +DATA01/DB2/spfileDB2.ora
Password file: +DATA01/DB2_UNQ/orapwSTANDBY
Domain:
Start options: open
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA01,RECO01
Mount point paths:
Services: service01,service02,service03
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: DB21,DB22
Configured nodes: host03,host04
Database is administrator managed

$ . oraenv
ORACLE_SID = [DB2] ? +ASM1
The Oracle base remains unchanged with value /u01/app/oracle
$ asmcmd
ASMCMD> pwcopy +DATA01/DB2/orapwDB2 /tmp/pwfile_check
copying +DATA01/DB2/orapwDB2 -> /tmp/pwfile_check

$ md5sum /tmp/pwfile_check
f8419fc807c1f3c1dc2f99d01031acb8 /tmp/pwfile_check

The checksums match so we know the files are identical. If the checksums did not match, you would need to copy the password file from the primary to the standby. When using ASM, you would need to use pwcopy (in ASMCMD) to move the password file from the file system to ASM.

Checking the Existing Lag

There are two ways to check for any existing lag, either through SQL*Plus or the Data Guard broker. On the standby, you can use the following SQL:

SQL> set lines 150 pages 2000 long 9999999 longc 9999999 num 10
SQL> col comments for a25
SQL> col type for a20
SQL> col units for a10
SQL> ALTER SESSION SET nls_date_format = 'DD-MON-RRRR HH24:MI:SS';
SQL> SELECT inst_id, start_time, item, units, sofar, total, timestamp, comments
  2  FROM gv$recovery_progress ORDER BY start_time asc;
   INST_ID START_TIME            ITEM                             UNITS           SOFAR      TOTAL TIMESTAMP             COMMENTS
---------- --------------------- -------------------------------- ---------- ---------- ---------- --------------------- -------------------------
         2 15-JUN-2017 22:27:20  Active Apply Rate                KB/sec           2789          0
         2 15-JUN-2017 22:27:20  Active Time                      Seconds       1018595          0
         2 15-JUN-2017 22:27:20  Apply Time per Log               Seconds           970          0
         2 15-JUN-2017 22:27:20  Average Apply Rate               KB/sec            841          0
         2 15-JUN-2017 22:27:20  Checkpoint Time per Log          Seconds             0          0
         2 15-JUN-2017 22:27:20  Standby Apply Lag                Seconds             1          0
         2 15-JUN-2017 22:27:20  Last Applied Redo                SCN+Time            0          0 02-AUG-2017 00:30:35  SCN: 614771800
         2 15-JUN-2017 22:27:20  Log Files                        Files            1048          0
         2 15-JUN-2017 22:27:20  Maximum Apply Rate               KB/sec          29219          0
         2 15-JUN-2017 22:27:20  Redo Applied                     Megabytes     3342700          0
         2 15-JUN-2017 22:27:20  Elapsed Time                     Seconds       4068194          0
11 rows selected.

Check the “Last Applied Redo” column – the timestamp should be within a few seconds of the current wall clock time. Run the query a few times to check it is incrementing, just in case.

For the Data Guard broker, you can run that from the primary database host:

$ dgmgrl /
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDG.
DGMGRL> show configuration
Configuration - DB1
Protection Mode: MaxPerformance
Members:
DB1_UNQ - Primary database
DB2_UNQ - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 10 seconds ago)
DGMGRL> show database 'DB2_UNQ';
Database – DB2_UNQ
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 4.12 MByte/s
Real Time Query: ON
Instance(s):
INSTANCE1 (apply instance)
INSTANCE2
Database Status:
SUCCESS

Make sure that Configuration Status is SUCCESS, and check that the transport lag is 0 and the apply lag is less than 10 minutes.

If all the above is OK, you should be ready to start the actual switchover operation, but first start tailing the alert logs on both the primary and standby sites. Where possible, use something like VNC Viewer or GNU screen so that your sessions remain open even if you get disconnected.

Switchover

Creating Restore Points

Just for ease of recoverability, it’s best to create a restore point on both the primary and the standby before starting the actual switchover operation. It’s best to do this in the order of standby first, and then the primary, which ensures the standby restore point can be used to flash back to a time before the primary. If the standby is open in read only mode, you will need to restart it in mount mode first:

SQL> SELECT database_role, open_mode FROM v$database;
DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY
SQL> exit

Now use srvctl to stop the database and mount it:

$ srvctl stop database -d DB2_UNQ -o immediate
$ srvctl start database -d DB2_UNQ -o mount

Next, set the state to APPLY_OFF:

$ dgmgrl /
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDG.
DGMGRL> show database "DB2_UNQ"
Database – DB2_UNQ
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 40.56 MByte/s
Real Time Query: OFF
Instance(s):
INSTANCE1
INSTANCE2 (apply instance)
Database Status:
SUCCESS

DGMGRL> edit database 'DB2_UNQ' set state='APPLY-OFF';
Succeeded.
DGMGRL> exit

Now it’s time to create the restore point on the standby database:

$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 2 03:28:42 2017
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
SQL> CREATE RESTORE POINT before_switchover GUARANTEE FLASHBACK DATABASE;
Restore point created.
SQL> exit

Now go to the primary database and create a restore point there as well:

$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 2 03:31:22 2017
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> CREATE RESTORE POINT before_switchover GUARANTEE FLASHBACK DATABASE;
Restore point created.
SQL> exit

Switching Over

At this point, everything should be ready to switchover. Make sure to monitor the alert logs while this takes place. Firstly, just double check the configuration. On the standby:

$ dgmgrl /
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDG.
DGMGRL> show configuration
Configuration - DB1
Protection Mode: MaxPerformance
Members:
DB1_UNQ - Primary database
DB2_UNQ - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 99 seconds ago)

DGMGRL> show database 'DB2_UNQ'
Database – DB2_UNQ
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 125.00 KByte/s
Real Time Query: OFF
Instance(s):
INSTANCE1
INSTANCE2 (apply instance)

Database Status:
SUCCESS

Now issue the actual switchover command:

DGMGRL> switchover to 'DB2_UNQ';
Performing switchover NOW, please wait...
New primary database "DB2_UNQ" is opening...
Oracle Clusterware is restarting database "DB2_UNQ" ...
ORA-01017: invalid username/password; logon denied
Warning: You are no longer connected to ORACLE.
shut down instance "INSTANCE1" of database "DB1_UNQ"
start up instance "INSTANCE1" of database "DB1_UNQ"

DGMGRL> exit

Post Checks

NOTE: From here on down the database roles have changed. That means the commands shown in red (for primary) are now issued on the original standby host, and the commands shown in blue (for standby) are now issued on the original primary host!

Once the switchover is complete, it’s always worthwhile doing a bit of checking to make sure the configuration is as it should be, and that the new standby is receiving and applying logs. Let’s start by checking the configuration from the broker. NOTE: Depending on how quickly you check, you may receive a number of different errors which will be cleared when the configuration catches up. For example, from the new primary database DB2 (which was the standby above):

$ dgmgrl /
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected as SYSDG.
DGMGRL> show configuration
Configuration – DB2
Protection Mode: MaxPerformance
Members:
DB2_UNQ - Primary database
Error: ORA-16810: multiple errors or warnings detected for the database

DB1_UNQ - Physical standby database
Error: ORA-12528: TNS:listener: all appropriate instances are blocking new connections
Fast-Start Failover: DISABLED
Configuration Status:
ERROR (status updated 53 seconds ago)

Don’t be too concerned by the errors. Waiting a few seconds and reissuing the “show configuration” command will show the errors have cleared:

DGMGRL> show configuration
Configuration – DB2
Protection Mode: MaxPerformance
Members:
DB2_UNQ - Primary database
DB1_UNQ - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 16 seconds ago)

DGMGRL> show database "DB1_UNQ"

Database - DB1_UNQ

Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 3.00 KByte/s
Real Time Query: ON
Instance(s):
INSTANCE1 (apply instance)
INSTANCE2

Database Status:
SUCCESS

DGMGRL>

Dropping Restore Points

Provided the role transition was successful, you can now remove the restore points from both the primary and standby. Remember that you need to restart the standby if it’s in read only mode.

$ srvctl stop database -d DB1_UNQ -o immediate
$ srvctl start database -d DB1_UNQ -o mount
$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 2 03:37:23 2017
Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> DROP RESTORE POINT before_switchover;
Restore point dropped.
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

Restart the database in READ ONLY mode:

$ srvctl stop database -d DB1_UNQ -o immediate
$ srvctl start database -d DB1 -o "READ ONLY"

Drop the restore point on the primary as well:

$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Wed Aug 2 03:37:23 2017
Copyright (c) 1982, 2014, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> DROP RESTORE POINT before_switchover;
Restore point dropped.
SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

Finally, check the services are running as expected. Check the standby:

$ srvctl status service -d DB1_UNQ
Service service01 is not running.
Service service02 is running on instance(s) INSTANCE1,INSTANCE2
Service service03 is not running.

And on the new primary:

$ srvctl status service -d DB2_UNQ
Service service01 is running on instance(s) INSTANCE1,INSTANCE2
Service service02 is not running.
Service service03 is running on instance(s) INSTANCE1,INSTANCE2

All is as expected so the switchover is successful and complete!

Pete

After 22 years of working at Oracle in just about every role except Marketing and Support, I am now working as a Senior Managed Services Consultant with Red Stack Tech, specializing in Oracle Database technology, High Availability and Disaster Recovery solutions. I am also a member of the OakTable Network, and have presented at RMOUG Training Days, Hotsos Symposia, Oracle OpenWorld conferences, and other user group events. I have co-authored the Expert Oracle Enterprise Manager 12c and Practical Oracle Database Appliance books published by Apress, and am one of the authors of the Building Database Clouds in Oracle Database 12c book published by Addison Wesley.

2 Comments:

  1. Nice and succint Peter. I am glad that all is working out.

Leave a Reply

Your email address will not be published. Required fields are marked *