INSTANCE RECOVERY – Oracle database have inherit feature of recovering from instance failures automatically. SMON is the background process which plays a key role in making this possible. Though this is an automatic process that runs after the instance faces a failure, it is very important for every DBA to understand how is it made possible. In this blog, we are going to understand the same with the help of a flow diagram and a demonstration.
Instance recovery is the way of bringing back the transactions which are not yet written to the disk by DBWR at the moment of instance failure. Instance failures are possible in multiple ways: hardware issues, power glitches, network outages, abnormal instance shutdowns (SHUTDOWN ABORT). Database startup after any of these failures will automatically trigger the instance recovery. Let us now get a clear understanding of this process with the help of the flow diagram and with hands-on activity as well. This exercise would similar to the one we did in “Why should we configure limits.conf for Oracle database?” blog.
Consider the image above, let us get into details of this flow.
- At 9:00 AM consider your database is working normally.
- At 9:10 AM think of database running 30 transactions in which 20 are committed and 10 are uncommitted, they are yet to be decided by user process to commit or rollback.
- While this transaction is still running, let us consider that instance is terminated at 9:11 AM.
- Now as we bring up the instance, a perfect recovery is the one that makes sure 20 transactions are not lost and 10 uncommitted transactions are lost. Oracle does it perfectly. Hoping so, startup up the instance normally.
- Once database mounts, then the initial phases of recovery begin.
- With the help of CKPT background process, instance recognizes any loss of transactions with the compatibility checks of SCN numbers between control files and data files.
- All the transactions which are lost without updated to the datafiles are re-ran with the help of REDO Which means that all 30 transactions are ran as on 9:10 AM now. This action is called as ROLL FORWARD.
- Database then checks the accessibility of UNDO segments and opens the database.
- On opening the database, now the 10 transactions which are not committed are rolled back making the database consistent with 20 committed transactions with the help of UNDO This action is called as ROLL BACK.
- Now that instance recovery is completed, Database is not recovered with 20 committed transactions. But yes, blocks would be updated into datafiles asynchronously by DBWR as usual.
We will now see the demonstration of the same to relate it with the reality.
- Database is working normally now.
SQL> select instance_name from v$instance; INSTANCE_NAME —————- ORADB
- Let us terminate the instance now.
[oracle@OEL-12c trace]$ ps -ef | grep smon oracle 4029 1 0 Oct21 ? 00:00:03 ora_smon_ORADB oracle 26131 1172 0 16:32 pts/0 00:00:00 grep –color=auto smon [oracle@OEL-12c trace]$ kill -9 4029
- Let us now startup the database and monitor the alert log file. Below is the important output of the alert log file.
Started redo scan Sun Oct 22 16:34:11 2017 Completed redo scan read 2450 KB redo, 553 data blocks need recovery Sun Oct 22 16:34:11 2017 Started redo application at Thread 1: logseq 11, block 15046 Sun Oct 22 16:34:11 2017 Recovery of Online Redo Log: Thread 1 Group 2 Seq 11 Reading mem 0 Mem# 0: /oradata/ORADB/onlinelog/o1_mf_2_drl8n920_.log Mem# 1: /oradata/fra/ORADB/onlinelog/group2_member2.log Sun Oct 22 16:34:11 2017 Completed redo application of 1.92MB Sun Oct 22 16:34:11 2017 Completed crash recovery at Thread 1: logseq 11, block 19947, scn 3934834
Alert log clearly says that REDO has been applied with the transactions which are not committed when instance terminated.
- As soon as the instance is opened, roll back operation is performed and below is the output of the same.
SQL> SELECT r.NAME “RB Segment Name”, dba_seg.size_mb, DECODE(TRUNC(SYSDATE – LOGON_TIME), 0, NULL, TRUNC(SYSDATE – LOGON_TIME) || ‘ Days’ || ‘ + ‘) || TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), ‘SSSSS’), ‘HH24:MI:SS’) LOGON, v$session.SID, v$session.SERIAL#, p.SOSID, v$session.USERNAME, v$session.STATUS FROM v$lock l, v$process p, v$rollname r, v$session, (SELECT segment_name, ROUND(bytes/(1024*1024),2) size_mb FROM dba_segments WHERE segment_type = ‘TYPE2 UNDO’ ORDER BY bytes DESC) dba_seg WHERE l.SID = p.pid(+) AND v$session.SID = l.SID AND TRUNC (l.id1(+)/65536)=r.usn AND l.TYPE(+) = ‘TX’ AND l.lmode(+) = 6 AND r.NAME = dba_seg.segment_name ORDER BY size_mb DESC; RB Segment Name SIZE_MB —————————— ———- LOGON SID SERIAL# ——————————————————– ———- ———- SOSID USERNAME STATUS ———————— —————————— ——– _SYSSMU3_3285411314$ 2.13 00:00:09 19 41123 26475 ACTIVE
I love to see your comments!
Feel free to give additional information in the comments, that I can validate and include in blog posts or update this as well.