That demned elusive PQ slave

With apologies to Emma Orczy for stealing a line from “The Scarlet Pimpernel” … 🙂

Introduction

We ran into a strange situation at a customer site this morning. A customer asked us to kill two stuck sessions on one of their 11.2.0.4 RAC Exadata environments, normally something they have sufficient knowledge and access to be able to handle themselves. Well this time round, every time they issued an “ALTER SYSTEM KILL SESSION ‘xxx, yyyyy'” command they were getting an error “User session ID does not exist”. Normally, you would expect this for one of a couple of reasons:

  • They were connected to the wrong instance (remember this is Exadata and RAC) or had forgotten to add the inst_id parameter to the command as well – not the case this time.
  • The “stuck” session was just slowly progressing and had finished of its own accord – again, not the case this time.

The two stuck sessions (SIDs 165 and 529) were still viewable in V$SESSION, but we couldn’t kill them because the serial# (the second parameter to the ALTER SYSTEM KILL SESSION command) was actively changing, faster than we could select the information from V$SESSION and kill the session. This post walks through how we resolved the problem.

Investigating the Issue

The first step to finding out what was going on was identifying the relevant database / instance information. The customer had provided us with the connection string (customer information obfuscated for obvious reasons!):

But there are 8 nodes in this Exadata environment using that “host05-scan” bit. Our records showed that particular instance was running on the 3rd node of the cluster, so we connected to that instance first to find the DB_UNIQUE_NAME:

Now we have the DB_UNIQUE_NAME, we can determine where the different instances are:

Now the customer had told us the SIDs that were problematic, so stealing a script from Tim Hall’s forever useful website and modifying it to see just those sessions, we could see this:

Fixing the Issue

So now we know a couple more things about these mysterious sessions:

  • They’re on instance2, which we can tell by comparing the program (oracle@hostdb0502) with the output of the srvctl command above
  • They’re actually both parallel query processes (P096 and P288)

Well, that should make it easy. We can just connect to the second node and kill those processes off at the OS level, right?

Unfortunately, wrong. When we went and looked at the second node, neither of those two process IDs existed. We could see processes P094 and P096, as well as P287 and P289, but those middle processes were just nowhere to be found. The situation just gets weirder and weirder!

Next, we decided to delve deeper into the PQ side, using another carefully stolen script, this time from Amit Bansal, and it showed us a way out:

Can you see the way out?

If you look carefully, you can see our sessions with SIDs 529 and 165, both active. More importantly, you can see there are two killed SQL*Plus sessions, sessions 871 and 1786. If these sessions are already killed in the database, is it possible that one of them was the master of our two missing PQ slaves? Well, since they’re already killed off in the database, we can kill the matching OS processes (if they were still there, of course!) without impacting any users. We checked those two SIDs using Tim’s script above, found the two SPIDs to match, and found those processes DID still exist at the OS level. We killed off one – nothing happened. We killed off the other, and lo and behold the sessions with SIDs 529 and 165 magically disappeared. Hey presto! Issue solved!

Pete

After 22 years of working at Oracle in just about every role except Marketing and Support, I am now working as a Senior Managed Services Consultant with Data Intensity, specializing in Oracle Database technology, High Availability and Disaster Recovery solutions. I am also a member of the OakTable Network, and have presented at RMOUG Training Days, Hotsos Symposia, Oracle OpenWorld conferences, and other user group events. I have co-authored the Expert Oracle Enterprise Manager 12c and Practical Oracle Database Appliance books published by Apress, and am one of the authors of the Building Database Clouds in Oracle Database 12c book published by Addison Wesley.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.