Sunday, April 1, 2012

Steps to resolve Error 1031 received logging on to the standby

Case Study.

    today, I would like to discuss about a typical issue faced on our environment.
One of our DataGuard environment was out of sync. Since the gap was too much time we decided to use the incremental recovery method and resolve the gap .

The issue faced

even though be resolved the gap, the subsequent redo transfer was not happening. When we checked in the
v$managed_process data dictionary view, we could see that RFS was not starting.

Analysis

The following observations were made.

  • The RFS process was not started in the standby which indicates that standby is not receiving any redo information from the primary.
  • In the primary alert log file we could see the errors indicating that the primary is not able to communicate with the standby instance. The error observed was "Error 1031 received logging on to the standby".
  • We tried to connect to the standby instance from primary by using the following command and it was successful.
sqlplus sys/oracle@tns.


  • the time stamp of the password file on the primary and the standby was different. This indicated the possibility of having the wrong password file in the standby.

solution


  • we copied the password file from the primary to the standby and renamed the password file in the following format  ie orapw<sid> . 
  • we restarted the media recovery process on the standby. 
Once the above steps are completed we could see that redo shipping and redo apply is taking place.

lessons learned.

For the redo transfer to take place efficiently, the password file on standby should be a copy from the primary and renamed standby.

For eg :  assume  our primary database password file is orapwdprim  and the standby instance is stand . In this case we should copy the password  file from primary and rename it to orapwdstand 

We can use v$pwd_file_users data dictionary view to check if the  password file is used

Friday, March 30, 2012

Steps to perform hang analysis in Oracle when database is inaccessible or hung state

Today, we had an issue where in which one of our rac database was hung.since the database was not responding to any of the commands the only option was to terminate the instance by the operating system commands. Before aborting the database we decided to collect as much disinformation possible so that or it will support service can find the root cause. In this can of situations Oracle recommends to collect hang analysis data for further troubleshooting.

Steps to perform an hang analysis
=========================

 Step 1
=========

   SQL> oradebug setmypid
   SQL> oradebug unlimit;
   SQL> oradebug hanganalyze 3

Step 2
========

   SQL> oradebug setmypid
   SQL> oradebug unlimit;
   SQL> oradebug dump systemstate 266

Step 3
========
   SQL> oradebug hanganalyze 3

Step 4
========
   SQL> oradebug setmypid
   SQL> oradebug unlimit;
   SQL> oradebug dump systemstate 266

Please refer metalink note 61552.1   for further details on the steps.


 Problem we faced
================
 we could not performed the above steps because the database was not accepting a new collection
 
Solution
======
 
When the database is not accepting the new collection is due to the hung status, start a preliminary correction to the database as follows
 
sqlplus -prelim " / as sysdba"
 
Then perform the steps mentioned above.
  
 
 After collecting the hang analysis details we terminated the database instance resolve the issue