Hanganalyze / Librarycache contention

Last week I was called up to  help to resolve one of the production database hung issue. This totally screwed the database by all means  [ Database connections were hanging, Connections on the server using SQL*Plus as sysdba was also hanging, EM grid was spinning ] Dammn totally stuck …. you may encounter such trap in your DBA life, so here how you come out from such vicious issue.

What next !!  You need a back door entry to clutch through, where sqlplus -prelim comes as a great help.

Before stepping into how did I analyze, let me walk through what is sqlplus -prelim : This is the backdoor entry method to access database (only SGA) in a hung situation where no other methods are allowing to create new sessions in database. This will not create any session in DB rather connect to SGA directly, you may think what can be done by only connecting SGA – Yes, you can do loads of thing to show your face from the issue.

A walk through

Connected to sqlplus using -prelim option. Remember, it wont say “Connected to Oracle Database″ etc. Just sql prompt will be displayed.

$ sqlplus -prelim

SQL*Plus: Release – Production on Fri May 20 04:42:05 2011

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.

Enter user-name: / as sysdba
04:42:09 SQL>

Now you can use oradebug utility to dump SGA. You can use oradebug only in sql prompt. Type oradebug help for all commands.

SQL> oradebug setmypid
SQL> oradebug hanganalyze 12

This will produce a trace file under udump with process ID. While examining the trace file, I found there are about 120 objects waiting to get the library cache latch, got the blocking session ID and Serial#. This gives me the OS process ID for the blocker and checked for the trace files with that process ID.  Yes, I got the blocking sql from the this trace and also the service name.

Now time to chase the application man ! I have asked him the logic used in this query and came to know he is running the update statement through a loop with multiple threads .. ohh God, no surprise why there is a library cache contention. So as a immediate remedy he was okay to kill that session, kill -9 and killed the process.  Immediately everything started working and later on application got a code fix to handle this SQL. That’s it so simple :-)

Conclusion :

When you can’t logon to the database to get the information from V$SESSION (EVENT), try using oradebug command to do hanganalyze. Of course to run oradebug you should connect to sqlplus and in hang situation sqlplus -prelim is the hero. [Note, sqlplus -prelim option is present only in Oracle 10g and above]

Some MOS notes helped me:

Note: 215858.1 – “Interpreting HANGANALYZE trace files to diagnose hanging and performance problems”

Note: 310830.1 – “How to Use Oradebug to Get Trace File Name and Location”

Note: 417879.1 – “-prelim Option When Running System State Dump”

Cheers .. have a great debugging ..


Viewers of this post also viewed...