Tuesday, March 18, 2008

RMAN Recovery Methods

RMAN Recovery Methods
When performing a restore operation, it is best to open two telnet sessions, one for SQL commands, and one for RMAN commands. For the rest of this document, RMAN commands will be prefaced with the RMAN> prompt, and SQL commands will be prefaced with the SQL> prompt. A $ indicates that the command is executed from the Unix prompt.

Restoring and Recovering All Datafiles

In this scenario, it is assumed that your control files are still accessible. You have a backup, done for example with backup database plus archivelog;

Your first step is to make sure that the target database is shut down:

$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

ORACLE instance shut down.

Next, you need to start up your target database in mount mode. RMAN cannot restore datafiles unless the database is at least in mount mode, because RMAN needs to be able to access the control file to determine which backup sets are necessary to recover the database. If the control file isn't available, you have to recover it first. Issue the STARTUP MOUNT command shown in the following example to mount the database:

SQL> startup mount;

Oracle instance started.
Database mounted

Since backup set files are created in an RMAN-specific format, you must use RMAN to restore the datafiles. To use RMAN, connect to the target database:

$ rman target / rcvcat rcvcat/rcvcat@oemprod

The remainder of this example shows how to restore all of the datafiles of the target database. When the restore command is executed, RMAN will automatically go to its last good backup set and restore the datafiles to the state they were in when that backup set was created.

When restoring database files, RMAN reads the datafile header and makes the determination as to whether the file needs to be restored. The recovery is done by allocating a channel for I/O and then issuing the RMAN restore database command.

With Oracle9i and above, you don't need to allocate a channel explicitly. Instead, you can use the default channel mode:

RMAN> restore database;

RMAN> recover database;

SQL> alter database open;

For Oracle8i, the ALLOCATE, RESTORE, and RECOVER commands need to be enclosed by the run{} command:

RMAN> run {

allocate channel d1 type disk;

restore database;

recover database;

}

alter database open;

Once the recovery has been completed, execute a complete RMAN backup to establish a new baseline.

Restoring Specific Tablespaces

In this scenario, it is assumed that your control files are still accessible. You have a backup, done for example with backup database plus archivelog;

Take the tablespace that needs recovery offline, restore the tablespace, recover the tablespace, and bring the tablespace online. If you cannot take the tablespace offline, then shutdown abort the database and restore in mount mode.

First try to take the tablespace offline;

$ sqlplus "/ as sysdba"

SQL> alter tablespace tab offline;

If this works, continue with the RMAN recovery:

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore tablespace tab;

RMAN> recover tablespace tab;

SQL> alter tablespace tab online;

If taking the tablespace offline fails, follow these steps:

$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore tablespace tab;

RMAN> recover tablespace tab;

SQL> alter database open;

Once the recovery has been completed, execute a complete RMAN backup to establish a new baseline.

Restoring Specific Datafiles

In this scenario, it is assumed that your control files are still accessible. You have a backup, done for example with backup database plus archivelog;

Take the datafile that needs recovery offline, restore the datafile, recover the datafile, and bring the datafile online. If you cannot take the datafile offline, then shutdown abort the database and restore in mount mode.

First try to take the datafile offline:

SQL> alter database datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf' offline;

If this works, continue with the RMAN recovery:

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf'

RMAN> recover datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf'
SQL> alter database datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf' online;

If taking the datafile offline fails, follow these steps:

$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf';

RMAN> recover datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf';

SQL> alter database open;

Once the recovery has been completed, execute a complete RMAN backup to establish a new baseline.

Restoring Control Files

In this scenario, it is assumed that your control files are backed up. You have a backup, done for example with backup database plus archivelog;

In an ideal world you'll never use RMAN to restore a control file. But if something catastrophic happens, and you lose all control files, here are the steps for getting them back:

The following examples assume that you are using a catalog. First, here's the simplest Oracle9i syntax for restoring a control file:

$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

SQL> startup nomount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

Set the database id (DBID) with the following command. This is a 10-digit number that RMAN uses to uniquely identify this database in the recovery log. The number can be obtained from any previous RMAN backup log file.

RMAN> set dbid = xxxxxxxxxx;
RMAN> restore controlfile;

SQL> alter database mount;

SQL> alter database open;

If this fails with ...

ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

... then you must perform a recover database:
SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> recover database;

SQL> alter database open resetlogs;
RMAN> reset database;

Note, that all offline archivelogs are now useless, perform a full back as soon as possible.

Restoring Online Redologs

In this scenario, it is assumed that your control files are backed up. You have a backup, done for example with backup database plus archivelog;

$ sqlplus "/ as sysdba"

SQL> shutdown abort;

SQL> startup nomount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> set dbid = xxxxxxxxxx;

RMAN> restore controlfile;

SQL> alter database mount;

RMAN> restore database;

RMAM> recover database;

RMAN-00571:=========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS=============

RMAN-00571:=========================================================

RMAN-03002: failure of recover command at 09/28/2004 11:03:23

RMAN-06054: media recovery requesting unknown log: thread 1 seq 1 lowscn 8448414

Since the online logs were lost, complete recovery is not possible. Open the database with resetlogs to continue.

RMAN> alter database open resetlogs;

-------------------------------------IMPORTANT-------------------------------------------
During this type of recovery, if you receive error messages like this:

RMAN> restore database;

Starting restore at 11-JUL-05

using channel ORA_SBT_TAPE_1
using channel ORA_DISK_1
RMAN-00571:=========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============
RMAN-00571:========================================================
RMAN-03002: failure of restore command at 07/11/2005 14:25:22
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 10 found to restore
RMAN-06023: no backup or copy of datafile 9 found to restore
RMAN-06023: no backup or copy of datafile 8 found to restore
RMAN-06023: no backup or copy of datafile 7 found to restore
RMAN-06023: no backup or copy of datafile 6 found to restore
RMAN-06023: no backup or copy of datafile 5 found to restore
RMAN-06023: no backup or copy of datafile 4 found to restore
RMAN-06023: no backup or copy of datafile 3 found to restore
RMAN-06023: no backup or copy of datafile 2 found to restore
RMAN-06023: no backup or copy of datafile 1 found to restore

…use the following directions to recover (recreate the controlfile):

1.With the database mounted, execute ‘alter database backup controlfile to trace resetlogs;’

2.Perform a shutdown abort on the database, but remain at the SQL> prompt.

3.In another telnet session, go to the udump directory to retrieve the resulting trace file and copy it to another location to edit it.

4.Edit the file and add the phrase “until cancel” to the recover database command at the end. The phrase should read “recover database until cancel using backup controlfile”. Remove the “alter database open” command after the recover command. Save the file with a .sql extension.

5.Back at the SQL> prompt, execute the modified trace file. When prompted for an archived log, type in “cancel” and the reply should be “media recovery cancelled”.

6.Issue “alter database open resetlogs”. The database should open after a few moments.

7.Connect to the RMAN recovery catalog and issue the “reset database” command.

8.Perform a full RMAN backup as soon as possible.

----------------------------------------------------------------------------------------
Time-Based, Change-Based, or SCN-based Incomplete Recovery

Incomplete recovery uses a backup to produce a non-current version of the database. In other words, youdo not apply all of the redo records generated after the most recent backup.

You usually perform incomplete recovery of the whole database in the following situations:

Media failure destroys some or all of the online redo logs.
A user error causes data loss, for example, a user inadvertently drops a table.
You cannot perform complete recovery because an archived redo log is missing.
You lose your current control file and must use a backup control file to open the database.

To perform incomplete media recovery, you must restore all datafiles from backups created prior to the time to which you want to recover and then open the database with the RESETLOGS option when recovery completes. The RESETLOGS operation creates a new incarnation of the database; in other words, a database with a new stream of log sequence numbers starting with log sequence 1.

NOTE – Start every RMAN incomplete recovery with the following commands:

$ sqlplus "/ as sysdba"

SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

--For time-based recovery, use these commands:

RMAN> restore database until time "to_date('07/12/05 12:0:00','MM/DD/YY HH24:MI:SS')RMAN> recover database until time "to_date('07/12/05 12:0:00','MM/DD/YY HH24:MI:SS')

media recovery complete.SQL> alter database open resetlogs;

--For SCN-based recovery, user these commands:
RMAN> restore database until scn 1000;

RMAN> recover database until scn 1000;

media recovery complete.

SQL> alter database open resetlogs;

--For change-based recovery, user these commands:

RMAN> restore database until sequence 9923; --Archived log sequence number

RMAN> recover database until sequence 9923; --Archived log sequence numbermedia recovery complete.

SQL> alter database open resetlogs;

Once the recovery has been completed, execute the following steps:

Delete prior backups with this command (from the RMAN prompt):

RMAN> delete force backup;

This command removes all prior backups from the RMAN catalog as they can no longer be used once the database has been restarted with the resletlogs option. After completing this command, create a new RMAN backup to establish a new baseline.

Recovering Archived Logs only

In the event that you want to recover the database archived redo logs until a desired time, you can use the following commands:

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore ARCHIVELOG FROM TIME 'SYSDATE-1' UNTIL TIME 'SYSDATE';
or
RMAN> restore ARCHIVELOG FROM TIME "to_date('07/11/05 00:00:01','MM/DD/YY HH24:MI:SS')UNTIL TIME 'SYSDATE';

-6N

Keywords:

RMAN Recovery Methods Restoring and Recovering All Datafiles Recovering Datafiles ALLOCATE, RESTORE, and RECOVER commands RMAN backup Restoring Specific Tablespaces Restoring Specific Datafiles Restoring Control Files offline archivelogs Restoring Online Redologs Perform a full RMAN backup Incomplete recovery Time-Based, Change-Based, or SCN-based Incomplete Recovery

Wednesday, March 5, 2008

Hot Backup internals

Why do I have to put a tablespace into a special hot backup mode when I perform a backup with operating system 'copy' or 'cp' commands? Why can't I just copy the files?

An operating system doesn't understand how Oracle works internally. So when you ask it to copy a database file, it has no idea of the significance of the internal organisation of the file it's copying, and simply copies whichever pieces of it happen to swing under the disk head from time to time. That is unfortunate for two separate reasons.

First, Oracle uses the first database block within every data file (known as the data file header block) to store that file's checkpoint change number -and it is this number which acts as the 'timestamp' for the entire data file. Since a recovery involves applying redo which was generated after the 'time' of a data file, it's important that the checkpoint change number for a data file faithfully records the earliest age of any part of that data file. We can then roll everything forward from that earliest point, and arrive at a totally consistent file as a whole. But if the checkpoint change number happens to state a time which is later than the actual time of even just one part of the data file's contents, then Oracle will not know to roll that particular piece of the file forward, and it will be 'left behind'. You'll have a corrupt data file on your hands, and an unrecoverable tablespace -and if it's the SYSTEM tablespace we're talking about, you've just lost your entire database.

For example, suppose my data file's checkpoint change number reads '1000', but there are actually blocks of data in that file which are actually from time '900'. During a recovery, Oracle will think it should apply redo from time 1000 onwards (because that's what the data file's header block tells it). But that leaves some blocks very out-of-synch with the rest of the file.

So the data file's header block is crucial in telling us the point from which we need to start rolling forward. But if you are doing a hot backup, then the instance? (specifically, the CKPT background process) may need to update the contents of that block whilst the file is being copied. That would not be a problem if the operating system was smart enough to know that the data file header block should be the first block copied... but, because it doesn't understand the internal organisation of an Oracle datafile, it doesn't know that. It may therefore copy that block after CKPT has updated it... and the backup file is therefore immediately rendered useless for recovery purposes.

Now that is our first problem with just copying hot data files: parts of a data file being out of synchronisation with each other, and with an unreliable checkpoint change number in the data file header, mean the data file as a whole is unusable. But the same sort of problem also arises within an Oracle block itself. We tend to think of an Oracle block as a single entity -but it isn't. Physically, an Oracle block is made up of multiple operating system blocks: for example, an 8K database block is usually comprised of 16 separate O/S blocks (because most O/S blocks are 512 bytes in size). For an Oracle block to be readable by an instance, all 16 of those O/S blocks must be synchronised, or from the same 'database time'. Unfortunately, if it's an operating system utility which is copying the database blocks, that utility has no idea that all 16 O/S blocks need to be kept synchronised: it just copies whichever of them happen to turn up under the disk head at any one moment.

But if people are actually modifying the contents of an Oracle block during the copy operation (which is definitely possible during a hot backup, of course), then you could conceivably end up with a copy of what we think of as a single Oracle block that is actually from 16 different database times. Such a block copy is termed a fractured block, and it's unusable in a recovery scenario.

So if you just copy your datafiles, whilst they're in use, you will end up with inconsistent block images, and a datafile header block that does not correctly describe the time of the entire datafile. Such a copy is useless.

OK, but what precisely does the begin backup command do to prevent these problems arising?

Two separate things. To deal with the data file header block not accurately describing the earliest time of the contents of the data file, the command causes the header block to be locked from all further updates by the CKPT process. If the data file header block is locked, so that it records the database time from just before it started to be copied, it must tell a valid point from which to start applying redo in a recovery scenario.

For example, if I lock the header block when it contains a checkpoint change number of '1000'; and if I then take half an hour to copy the entire file; then there are bound to be blocks within the resulting backup file from time 1100, 1200, 1400 and so on. But there will not be blocks from a time before 1000, and therefore no bits of the file would get 'forgotten' during a recovery.

Second: to deal with the fractured block issue, the begin backup command sets a flag which tells Oracle that the first piece of DML to modify any part of a block whilst its file is being copied will cause the entire block to be written to the redo logs. This ensures that whilst the image of the block in the backup file is fractured, and hence useless, there is a clean image of the block in the redo stream -which we can obviously use to replace the fractured version during a recovery operation.

I think I understand. Can you show me these two things happening?

Sure. The locked datafile header block is easiest to demonstrate. First, I issue this piece of SQL:

SELECT T.NAME, H.FILE#, H.CHECKPOINT_CHANGE#

FROM V$TABLESPACE T, V$DATAFILE_HEADER H

WHERE T.TS#=H.TS#;

NAME FILE# CHECKPOINT_CHANGE#

SYSTEM 1 121923

UNDOTBS 2 121923

INDX 3 121923

TOOLS 4 121923

USERS 5 121923

So, all tablespaces (and all data files) start off synchronised with the same checkpoint change number. Now let's go through the motion of beginning a hot backup and see what happens:

alter tablespace users begin backup;

select t.name, h.file#, h.checkpoint_change#

from v$tablespace t, v$datafile_header h

where t.ts#=h.ts#;

NAME FILE# CHECKPOINT_CHANGE

SYSTEM 1 121923

UNDOTBS 2 121923

INDX 3 121923

TOOLS 4 121923

USERS 5 121925

Do you notice how the USERS tablespace is now actually slightly ahead of the time the rest of the database has gotten to? That's because, as a tablespace is put into hot backup, any dirty buffers which belong to that tablespace are flushed by DBWR from the buffer cache back down to disk. CKPT then updates the time in that file's header block. In short, putting a tablespace into hot backup causes that tablespace (alone) to be checkpointed.

That datafile doesn't stay ahead of the rest of the database for long, however. If we simulate some transactional activity on the database, we can check the timestamps again:

alter system switch logfile;

alter system switch logfile;

alter system switch logfile;

alter system switch logfile;

At each log switch (which would normally be caused by users continuing to perform transactions, of course), a system-wide checkpoint is issued. So if we now check the data file headers again, we should see some changes:

select t.name, h.file#, h.checkpoint_change#

from v$tablespace t, v$datafile_header h

where t.ts#=h.ts#;

NAME FILE# CHECKPOINT_CHANGE

SYSTEM 1 121939

UNDOTBS 2 121939

INDX 3 121939

TOOLS 4 121939

USERS 5 121925

You might need to look at those numbers carefully to spot the effect, but if you do, you'll see that file 5 is no longer ahead of the pack. In fact, it's still stuck very firmly at timestamp 121925, whilst the other files have moved on to later times. Of course, the checkpoint change number reported here for file 5 is a complete lie in a sense: the contents of file 5 will be just as much at time 121939 as the rest of the database. But the 'begin backup' has locked the header so that it reports the time the file started being backed up, not the time of its contents.

OK. The data file header issue I can see. What about the block-sized redo thing?

That one's a bit harder to demonstrate. We'll first have to run an Oracle script that makes it possible to see how much redo a transaction generates. Then we can perform the same transaction when the data file is not being backed up and compare it with when it is.

First, I connect to my database as a normal user -in this case, as Scott. I then run the Oracle-supplied UTLXPLAN.SQL script. And as SYS, I then grant Scott the DBA role as a quick-fix way of allowing Scott to make use of the redo-measuring features that the UTLXPLAN script makes possible:

sqlplus scott/tiger

@%ORACLE_HOME%\rdbms\admin\utlxplan.sql

connect / as sysdba

grant dba to scott;

connect scott/tiger

Now we can test the amount of redo a transaction generates normally:

set autotrace trace stat

update emp set sal=900 where ename='MILLER';

Statistics

339 recursive calls

3 db block gets

65 consistent gets

5 physical reads

188 redo size

631 bytes sent via SQL*Net to client

548 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

8 sorts (memory)

0 sorts (disk)

1 rows processed

The set autotrace trace stat command means we get to see a short report about what went on inside Oracle to process the update command I issued. As you can see, the update of Miller's salary generated 188 bytes of redo. That's what this particular update generates â€œnormallyâ€. Now, let me put the tablespace where the EMP table lives into hot backup and issue exactly the same update:

alter tablespace users begin backup;

update emp set sal=900 where ename='MILLER';

Statistics

0 recursive calls

2 db block gets

3 consistent gets

0 physical reads

8740 redo size

632 bytes sent via SQL*Net to client

548 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

1 rows processed

Suddenly, the exact same transaction now generates 8,740 bytes of redo. That's an extra 8,552 bytes, or a 4500% increase! The reason is not hard to find:

Keywords:

Hot Backup internals special hot backup mode suppose my data file's checkpoint CKPT has updated it oracle hot backup Oracle hotbackup Internals of Oracle hotbackup Internals of Oracle hot backup how hot backup works what is happening in hot backup begin backup alter tablespace set autotrace trace stat system-wide checkpoint

Oracle Memory Tuning

Redo Log Buffer Latches
Database Buffer Cache Size
Shared Pool Size

Redo Log Buffer Latches

When a transaction is ready to write its changes to the redo log, it first has to grab the Redo Allocation Latch, of which there is only one, to keep others from writing to the log at the same time. If someone else has that latch, it has to wait for the latch, resulting in a "miss".

Once it grabs that latch, if the change is larger than log_small_entry_max_size bytes and if your server has multiple CPU's, it then tries to grab a Redo Copy Latch, of which there can be up to 2 times the number of CPU's, which would allow it to release the Redo Allocation Latch for someone else to use. If none of them are available, resulting in an "immediate miss", it will not wait for a Redo Copy Latch (thus, the "immediate"), but, instead, hangs on to the Redo Allocation Latch until the change is written.

Oracle keeps statistics for these latches in v$latch, including the number of gets and misses for the Redo Allocation Latch and the number of immediate gets and immediate misses for the Redo Copy Latches, which are cumulative values since instance startup. If you've got a 100% hit ratio for either of those latch types, that's a good thing. It just means that all of your transactions were able to grab and use the latch without retrying. It's when you get below a 99% hit ratio that you need to start looking out. The following sql figures the current hit ratios for those latches:

column latch_name format a20
select name latch_name, gets, misses,
round(decode(gets-misses,0,1,gets-misses)/
decode(gets,0,1,gets),3) hit_ratio
from v$latch where name = 'redo allocation';

column latch_name format a20
select name latch_name, immediate_gets, immediate_misses,
round(decode(immediate_gets-immediate_misses,0,1,
immediate_gets-immediate_misses)/
decode(immediate_gets,0,1,immediate_gets),3) hit_ratio
from v$latch where name = 'redo copy';

If your Redo Allocation Latch hit ratio consistently falls below 99%, and if you have a multi-CPU machine, you can lower the value for log_small_entry_max_size (see below) in your init.ora file (ours is currently 800 bytes, but, maybe 100 or so bytes may be better - you'll have to try out different values over time), which says that any change smaller than that will hang onto the Redo Allocation Latch until Oracle is finished writing that change. Anything larger than that grabs a Redo Copy Latch, if currently available, and releases the Redo Allocation Latch for another transaction to use.

If your Redo Copy Latch hit ratio consistently falls below 99%, and if you have a multi-CPU machine, you can raise the value of log_simultaneous_copies in your init.ora file up to twice the number of CPU's to provide more Redo Copy Latches (there is only one Redo Allocation Latch, so it is at a premium). Remember that you have to shut down your database instance and restart it to reread the new parameter values in the init.ora file ($ORACLE_HOME/dbs/initSID.ora). The following sql shows the current values for those associated parameters:

column name format a30
column value format a10
select name,value from v$parameter where name in ('log_small_entry_max_size','log_simultaneous_copies',
'cpu_count');

Database Buffer Cache Size

The Database Buffer Cache is part of the Shared Global Area (SGA) in memory for a single database instance (SID) and holds the blocks of data and indexes that you and everyone else is currently using. It may even contain multiple copies of the same data block if, for example, more than one transaction is making changes to it but not yet committed, or, if you are looking at the original copy (select) and someone else is looking at their modified but uncommitted copy (insert, update, or delete). The parameters db_block_buffers and db_block_size in your init.ora file determine the size of the buffer cache. db_block_size, in bytes, is set at database creation, and cannot be changed (unless you recreate the database from scratch), so, the only thing that you can adjust is the number of blocks in db_block_buffers (one buffer holds one block).
The Database Buffer Cache is part of the Shared Global Area (SGA) in memory for a single database instance (SID) and holds the blocks of data and indexes that you and everyone else is currently using. It may even contain multiple copies of the same data block if, for example, more than one transaction is making changes to it but not yet committed, or, if you are looking at the original copy (select) and someone else is looking at their modified but uncommitted copy (insert, update, or delete). The parameters db_block_buffers and db_block_size in your init.ora file determine the size of the buffer cache. db_block_size, in bytes, is set at database creation, and cannot be changed (unless you recreate the database from scratch), so, the only thing that you can adjust is the number of blocks in db_block_buffers (one buffer holds one block).

The Cache Hit Ratio shows how many blocks were already in memory (logical reads, which include "db block gets" for blocks you are using and "consistent gets" of original blocks from rollback segments that others are updating) versus how many blocks had to be read from disk ("physical reads"). Oracle recommends that this ratio be at least 80%, but, I like at least 90% myself. The ratio can be obtained from values in v$sysstat, which are constantly being updated and show statistics since database startup (it is only accessable from a DBA user account). You will get a more representative sample if the database has been running several hours with normal user transactions taking place. The Cache Hit Ratio is determined as follows:

select (1-(pr.value/(dbg.value+cg.value)))*100
from v$sysstat pr, v$sysstat dbg, v$sysstat cg
where pr.name = 'physical reads'
and dbg.name = 'db block gets'
and cg.name = 'consistent gets';

If you have a low Cache Hit Ratio, you can test to see what the effect of adding buffers would be by putting "db_block_lru_extended_statistics = 1000" in the init.ora file, doing a shutdown and startup of the database, and waiting a few hours to get a representative sample. Oracle determines how many Additional Cache Hits (ACH) would occur for each query and transaction for each of the 1000 buffer increments (or whatever other maximum value you might want to try out), and places them into the x$kcbrbh table, which is only accessable from user "sys". To measure the new Cache Hit Ratio with, for example, 100 extra buffers, determine ACH as follows:

select sum(count) "ACH" from x$kcbrbh where indx < 100; and plug that value into the Cache Hit Ratio formula as follows: select (1-((pr.value-&ACH)/(dbg.value+cg.value)))*100 from v$sysstat pr, v$sysstat dbg, v$sysstat cg where pr.name = 'physical reads' and dbg.name = 'db block gets' and cg.name = 'consistent gets'; If the ratio originally was lower than 80% and is now higher with ACH, you may want to increase db_block_buffers by that number of extra buffers, restarting your database to put the increase into effect. Be sure to try several values for the number of extra buffers to find an optimum for your work load. Also, remove db_block_lru_extended_statistics from your init.ora file before restarting your database to stop gathering statistics, which tends to slow down the transaction time. (Removing that clears the x$kcbrbh table.) Also, make sure that your server has enough memory to accomodate the increase! If you are running really tight on memory, and the Cache Hit Ratio is running well above 80%, you might want to check the effect of lowering the number of buffers, which would release Oracle memory that could then be used by other processes, but would also potentially slow down database transactions. To test this, put "db_block_lru_statistics = true" in your init.ora file and restart your database. This gathers statistics for Additional Cache Misses (ACM) that would occur for each query and transaction for each of the buffer decrements up to the current db_block_buffers value, placing them into the x$kcbcbh table, also only accessable from user "sys". To measure the new Cache Hit Ratio with, for example, 100 fewer buffers, determine ACM as follows: select sum(count) "ACM" from x$kcbcbh where indx >= (select max(indx)+1-100 from x$kcbcbh);

and plug that value into the Cache Hit Ratio formula as follows:

select (1-((pr.value+&ACM)/(dbg.value+cg.value)))*100
from v$sysstat pr, v$sysstat dbg, v$sysstat cg
where pr.name = 'physical reads'
and dbg.name = 'db block gets'
and cg.name = 'consistent gets';

If the ratio is still above 80%, you may want to decrease db_block_buffers by that number of fewer buffers, restarting your database to put the decrease into effect. Be sure to try several values for the number of fewer buffers to find an optimum for your work load. Also, remove db_block_lru_statistics from your init.ora file before restarting your database to stop gathering statistics, which tends to slow down the transaction time. (Removing that clears the x$kcbcbh table.)

I have three scripts which you can use to figure your instance's optimum number of db_block_buffers. And computes the current ratio for the database buffer cache, and can be run from any DBA account. And computes the resulting ratio for an increase in the buffer cache size of the given number of buffer blocks (figuring ACH itself). It must be run from user "sys", after a representative sampling time with db_block_lru_extended_statistics in place. . It must be run from user "sys", after a representative sampling time with db_block_lru_statistics in place.

Shared Pool Size

The Shared Pool is also part of the Shared Global Area (SGA) in memory for a single database instance (SID) and holds the Library Cache with the most recently used SQL statements and parse trees along with PL/SQL blocks, and the Data Dictionary Cache with definitions of tables, views, and other dictionary objects. Both of those sets of cached objects can be used by one or more users, and are aged out (Least Recently Used) as other objects need the space. (You can pin large frequently-used objects in the Shared Pool for performance and other reasons, but, I won't go into that here.)

There are several ratios that you can check after a representative sample time that may indicate that you need to enlarge the shared pool, which is set by the shared_pool_size parameter in your init.ora file and defaults to 3500000 (3.5 Meg). One indicator is the Library Cache Get Hit Ratio, which shows how many cursors are being shared (SQL statements (gets) which were already found and parsed (gethits) in the shared pool, with no parsing or re-parsing needed), and is determined by:

select gethits,gets,gethitratio from v$librarycache
where namespace = 'SQL AREA';

If the gethitratio is less than 90%, you should consider increasing the shared pool size. Another indicator is the reloads per pin ratio, which shows how many parsed statements (pins) have been aged out (reloaded) of the shared pool for lack of space (ideally 0), and is determined by:

select reloads,pins,reloads/pins from v$librarycache
where namespace = 'SQL AREA';

If the reloads/pins ratio is more than 1%, you should consider increasing the shared pool size. A third indicator, which is not as important as the first two, is the dictionary object getmisses per get ratio, which shows how many cached dictionary object definitions in the dictionary cache are encountering too many misses (aged out?), and is determined by:

select sum(getmisses),sum(gets),sum(getmisses)/sum(gets)
from v$rowcache;

If the getmisses/gets ratio is more than 15%, you should consider increasing the shared pool size.

If these ratios indicate that your shared pool is too small, you can estimate the size of the shared pool by doing the following. Set the shared_pool_size to a very large number, maybe a fourth or more of your system's available memory, depending on how many other instances and processes that you have running that are also using memory, then shutdown and startup your database and let it run for a representative time (like all day or when a large batch job is running that you want to accomodate), then, figure the memory required for packages and views, memory required for frequently used SQL statements, and memory required for users SQL statements executed, as shown below:

select sum(sharable_mem) "Packages/Views" from v$db_object_cache;
select sum(sharable_mem) "SQL Statements" from v$sqlarea
where executions > 5;select sum(250 * users_opening) "SQL Users" from v$sqlarea;

Then, add the above three numbers and multiply the results by 2.5. Use this estimated size as a guideline for the value for shared_pool_size, changing that parameter to the estimated size or back to the original size and doing another shutdown/startup to put the value into effect.

select sum(a.spspv) "Packages/Views", sum(a.spssql) "SQL
Statements",
sum(a.spsusr) "SQL Users", round((sum(a.spspv) + sum(a.spssql) +
sum(a.spsusr)) * 2.5,-6) "Estimated shared_pool_size"
from (select sum(sharable_mem) spspv, 0 spssql, 0 spsusr
from v$db_object_cache
union all
select 0, sum(sharable_mem), 0 from v$sqlarea
where executions > 5
union all
select 0, 0, sum(250 * users_opening) from v$sqlarea) a;

Keywords:

Oracle Memory Tuning Redo Log Buffer Latches Database Buffer Cache Size Shared Pool Size Redo Allocation Latch v$latch 99% hit ratio Shared Global Area Database Buffer Cache db_block_size physical reads v$sysstat db block gets db_block_lru_statistics shared_pool_size librarycache pins cached dictionary object definitions shared pool advisery