Switching to a Physical Standby

Now that I’m back to doing more normal DBA work (see here for the reasons behind that), I’ll be posting some more how-to type posts from a DBA perspective, not necessarily an EM perspective. While I’m hoping these posts will be of interest to others, I also want to use them to keep track of just how I did different things so I can rinse and repeat as necessary. 😊 A special shout-out to John Evans, a colleague of mine at Red Stack Tech, who wrote the original document this post was based on.

This first post will be covering the process of performing a switchover from a primary database to a physical standby database. In this particular case, the two environments are running on Exadata machines (the first time I’ve really done anything with Exadata environments other than just monitoring them, so that’s cool!), with the RAC databases running on ASM. Obviously, this work was done for a particular customer, so your environment may vary from this one so please be prepared to take that into account (for example, the database version in this environment is 12.1.0.2). And equally obviously, no customer details will be shared in this post, so specifics like host names, database names and so forth will be deliberately obfuscated. In our initial configuration, DB1 is the primary database and DB2 is the physical standby database.

Initial Preparation

There’s a bunch of preparation work that needs to be done before the switchover process actually take place, to ensure the system is ready for switchover. Much of this can be done while the databases are up and running, and the apply process is still switched on.

Reviewing the Alert Logs

The first thing you should do is review the alert logs on both the primary and physical standby to see if there is any recent activity that might indicate a possible problem. There could be a number of errors that are problematic so you’ll need to use your best judgement on this one. If you are unsure where the trace directory is located, a simple SELECT statement can find it for you:

Checking Services

Make a note of which services run on the primary and standby sites. Since these are Exadata machines, everything has been configured in CRS to start depending on the database role. However, it’s worthwhile noting this so you can compare the services after the switchover and make sure they are running correctly (and restart them if not).
NOTE: In this environment, we need the DB_UNIQUE_NAME to be passed to the srvctl commands. You can find that by using the command:

Check the services on the primary:

Now check them on the standby:

Checking TEMP and Datafiles

The temporary files should match on both sides (i.e. they should exist).

If you find temp files on the primary that are not on the standby, the tablespaces will already be on the standby so you can just add tempfiles of the right size. On the standby:

Also, no data files should be offline on either the primary or the standby (check on both):

Checking Flashback

If flashback is not turned on, recovering from a failed role transition can be difficult and messy. It is recommended to create restore points to allow faster recovery in the event of a failed role transition (this will be done just before switchover).

Check to see if flashback is enabled and how far back it currently reaches:

Checking LOG_ARCHIVE_DEST

In this environment, LOG_ARCHIVE_DEST_2 is the one that is used for sending archived logs to the standby:

Checking SYS Passwords

Check that you can connect from the primary to the standby (and vice versa) using the SYS account. On the primary:

And on the standby:

Checking the Password Files

Check where the password files and that they are identical. Firstly, on the primary:

Since the password file is in ASM, copy it out to a filesystem:

Now do the same on the standby and compare the MD5 checksum values:

The checksums match so we know the files are identical. If the checksums did not match, you would need to copy the password file from the primary to the standby. When using ASM, you would need to use pwcopy (in ASMCMD) to move the password file from the file system to ASM.

Checking the Existing Lag

There are two ways to check for any existing lag, either through SQL*Plus or the Data Guard broker. On the standby, you can use the following SQL:

Check the “Last Applied Redo” column – the timestamp should be within a few seconds of the current wall clock time. Run the query a few times to check it is incrementing, just in case.

For the Data Guard broker, you can run that from the primary database host:

Make sure that Configuration Status is SUCCESS, and check that the transport lag is 0 and the apply lag is less than 10 minutes.

If all the above is OK, you should be ready to start the actual switchover operation, but first start tailing the alert logs on both the primary and standby sites. Where possible, use something like VNC Viewer or GNU screen so that your sessions remain open even if you get disconnected.

Switchover

Creating Restore Points

Just for ease of recoverability, it’s best to create a restore point on both the primary and the standby before starting the actual switchover operation. It’s best to do this in the order of standby first, and then the primary, which ensures the standby restore point can be used to flash back to a time before the primary. If the standby is open in read only mode, you will need to restart it in mount mode first:

Now use srvctl to stop the database and mount it:

Next, set the state to APPLY-OFF:

Now it’s time to create the restore point on the standby database:

Now go to the primary database and create a restore point there as well:

Switching Over

At this point, everything should be ready to switchover. Make sure to monitor the alert logs while this takes place. Firstly, just double check the configuration. On the standby:

Now issue the actual switchover command:

Post Checks

NOTE: From here on down the database roles have changed!

Once the switchover is complete, it’s always worthwhile doing a bit of checking to make sure the configuration is as it should be, and that the new standby is receiving and applying logs. Let’s start by checking the configuration from the broker. NOTE: Depending on how quickly you check, you may receive a number of different errors which will be cleared when the configuration catches up. For example, from the new primary database DB2 (which was the standby above):

Don’t be too concerned by the errors. Waiting a few seconds and reissuing the “show configuration” command will show the errors have cleared:

Dropping Restore Points

Provided the role transition was successful, you can now remove the restore points from both the primary and standby. Remember that you need to restart the standby if it’s in read only mode.

Restart the database in READ ONLY mode:

Drop the restore point on the primary as well:

Finally, check the services are running as expected. Check the standby:

And on the new primary:

All is as expected so the switchover is successful and complete!

Pete

After 22 years of working at Oracle in just about every role except Marketing and Support, I am now working as a Senior Managed Services Consultant with Data Intensity, specializing in Oracle Database technology, High Availability and Disaster Recovery solutions. I am also a member of the OakTable Network, and have presented at RMOUG Training Days, Hotsos Symposia, Oracle OpenWorld conferences, and other user group events. I have co-authored the Expert Oracle Enterprise Manager 12c and Practical Oracle Database Appliance books published by Apress, and am one of the authors of the Building Database Clouds in Oracle Database 12c book published by Addison Wesley.

9 Comments:

  1. Nice and succint Peter. I am glad that all is working out.

  2. Thanks for this article.
    I would like to know why the Start Option of the standby is Open and not mount ? Because it is an Active dataguard ?

    • Are you referring to the step where we are creating restore points? If so, yes, it was open read only because it was an Active Data Guard configuration. If not, can you clarify exactly where you are referring to in the original post?

      Thanks
      Pete

  3. Very useful and well organized article.

    Pete, What is the significance of setting ‘apply off’ prior to setting the guaranteed restore point on the standby ? The standby was bounced and started up in mount mode … so there shouldn’t be any apply going on ? Do you foresee any issue if the apply is not set to off ?

    Thanks.

    • I guess because the broker kicks off the recovery ? ..and if apply is not set to off, any potential issues ?

    • Correct, there shouldn’t be. But this is a multi-DBA site with DBAs spread across the globe, so it’s really just to be sure someone doesn’t get in and do things they shouldn’t. 🙂

  4. Do yoᥙ mind if I quote ɑ couplpe of your posts as
    long aѕ I provide credit аnd sources bac tto ʏour weblog?
    Mу website iis іn thee very same niche as ʏߋurs aand my usеrs woսld tгuly benefit from somе of the information yoou ρresent
    here. Ρlease ⅼеt me know if thks olay wіth
    you. Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *