Monday, August 25, 2008

Oracle RAC installation on Solaris SPARC 64 bit

Keyword:

Oracle RAC Installation RAC Installation

Installing Oracle RAC Installing

Oracle RAC on Solaris



Few weeks back i did a 2 node oracle RAC installation
The Machines were Soalris 10 SPARC 64 bit (Sun-Fire-T2000).
The Shared storage was NAS
Even though Solaris 10 uses resource control, the kernel parameters were added in the /etc/system (#Metalink: Note:367442.1)

The group OINSTALL and user ORACLE were created on both nodes

Few parameters were tuned in /etc/rc2.d/S99nettune

bash-3.00# more /etc/rc2.d/S99nettune
#!/bin/sh
ndd -set /dev/ip ip_forward_src_routed 0
ndd -set /dev/ip ip_forwarding 0
ndd -set /dev/tcp tcp_conn_req_max_q 16384
ndd -set /dev/tcp tcp_conn_req_max_q0 16384
ndd -set /dev/tcp tcp_xmit_hiwat 400000
ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_cwnd_max 2097152
ndd -set /dev/tcp tcp_ip_abort_interval 60000
ndd -set /dev/tcp tcp_rexmit_interval_initial 4000
ndd -set /dev/tcp tcp_rexmit_interval_max 10000
ndd -set /dev/tcp tcp_rexmit_interval_min 3000
ndd -set /dev/tcp tcp_max_buf 4194304
ndd -set /dev/tcp tcp_maxpsz_multiplier 10
#Oracle Required
ndd -set /dev/udp udp_recv_hiwat 65535
ndd -set /dev/udp udp_xmit_hiwat 65535

Check /etc/system is readable by ORACLE (else RDBMS installation will fail)

-rw-r–r– 1 root root 2561 Apr 17 16:03 /etc/system
Checked the system config on both nodes
For RAM
/usr/sbin/prtconf | grep “Memory size”
Memory size: 8064 Megabytes

For SWAP

/usr/sbin/swap -s
total: 4875568k bytes allocated + 135976k reserved = 5011544k used, 9800072k available
For /tmp
df -h /tmp

Filesystem size used avail capacity Mounted on

swap 9.4G 31M 9.4G 1% /tmp

For OS

/bin/isainfo -kv
64-bit sparcv9 kernel modules

For user

id -a #both UID and GID of user ORACLE should be same on both nodes
uid=300(ORACLE) gid=300(oinstall) groups=300(oinstall),301(dba),503(tms),504(mscat),102(dwh)

User nobody should exist

id -a nobody

uid=60001(nobody) gid=60001(nobody) groups=60001(nobody)
I had the below entries in the /etc/hosts on both nodes
[]cat /etc/hosts

#Public:
3.208.169.203 ownserver01 ownserver01ipmp0 loghost
3.208.169.207 ownserver02 ownserver02ipmp0 loghost
#Private:
10.47.2.82 ownserver01ipmp1 # e1000g1 -Used this while installing cluster
10.47.2.85 ownserver02ipmp1 # e1000g1 -Used this while installing cluster
10.47.2.76 ownserver01ipmp2 # e1000g0
10.47.2.79 ownserver02ipmp2 # e1000g0
#Vip:
3.208.169.202 ownserverv01
3.208.169.206 ownserverv02

All the interfaces had their ipmp groups.

Confirmed that the interface names of both Private and Public are same across the nodes.
e1000g3 was the Public Interface on both nodes.
e1000g0 and e1000g1 were the Private Interface on both nodes.
-I had 2 interfaces for Private Interconnect, of which i used e1000g1 during the cluster installation.
-The interface names for e1000g1 on each node were ownserver01ipmp1 and ownserver02ipmp1

Below is the ‘ifconfig -a‘ from Ownserver01

ownserver01 [SHCL1DR1]$ ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 10.47.2.76 netmask ffffffe0 broadcast 10.47.2.95
groupname ipmp2
e1000g0:1: flags=9040843 mtu 1500 index 2
inet 10.47.2.77 netmask ffffffe0 broadcast 10.47.2.95
e1000g0:2: flags=9040842 mtu 1500 index 2
inet 10.47.2.77 netmask ff000000 broadcast 10.255.255.255
e1000g1: flags=1000843 mtu 1500 index 3
inet 10.47.2.82 netmask ffffffe0 broadcast 10.47.2.95
groupname ipmp1
e1000g1:1: flags=9040843 mtu 1500 index 3
inet 10.47.2.83 netmask ffffffe0 broadcast 10.47.2.95
e1000g1:2: flags=9040842 mtu 1500 index 3
inet 10.47.2.83 netmask ff000000 broadcast 10.255.255.255
e1000g2: flags=1000843 mtu 1500 index 4
inet 10.47.2.11 netmask ffffffc0 broadcast 10.47.2.63
groupname ipmp3
e1000g2:1: flags=9040843 mtu 1500 index 4
inet 10.47.2.12 netmask ffffffc0 broadcast 10.47.2.63
e1000g2:2: flags=9040842 mtu 1500 index 4
inet 10.47.2.12 netmask ff000000 broadcast 10.255.255.255
e1000g3: flags=1000843 mtu 1500 index 5
inet 3.208.169.203 netmask ffffffc0 broadcast 3.208.169.255
groupname ipmp0
e1000g3:1: flags=9040843 mtu 1500 index 5
inet 3.208.169.204 netmask ffffffc0 broadcast 3.208.169.255
e1000g3:2: flags=9040842 mtu 1500 index 5
inet 3.208.169.204 netmask ff000000 broadcast 3.255.255.255
nxge0: flags=69040843 mtu 1500 index 6
inet 10.47.2.13 netmask ffffffc0 broadcast 10.47.2.63
groupname ipmp3
nxge1: flags=69040843 mtu 1500 index 7
inet 3.208.169.205 netmask ffffffc0 broadcast 3.208.169.255
groupname ipmp0
nxge2: flags=69040843 mtu 1500 index 8
inet 10.47.2.78 netmask ffffffe0 broadcast 10.47.2.95
groupname ipmp2
nxge3: flags=69040843 mtu 1500 index 9
inet 10.47.2.84 netmask ffffffe0 broadcast 10.47.2.95
groupname ipmp1



From the above output its clear that On Ownserver01

e1000g0 is 10.47.2.76 and e1000g1 is 10.47.2.82 - For Private Interconnect
e1000g2 is 10.47.2.11 - For Shared Storage (NAS)
e1000g3 is 3.208.169.203 - For Public

On Ownserver02

e1000g0 is 10.47.2.79 and e1000g1 is 10.47.2.85 - For Private Interconnect
e1000g2 is 10.47.2.14 - For Shared Storage (NAS)
e1000g3 is 3.208.169.207 - For Public

Checked for SSH and SCP in /usr/local/bin/

The cluster verification utility(runcluvfy.sh ) checks for scp and ssh in /usr/local/bin/.
Create soft links of ssh and scp in /usr/local/bin/ if they are not there.
cd /usr/local/bin/
ls -l
lrwxrwxrwx 1 root root 12 Apr 25 16:57 /usr/local/bin/scp -> /usr/bin/scp
lrwxrwxrwx 1 root root 12 Apr 25 16:57 /usr/local/bin/ssh -> /usr/bin/ssh

Checked SSH equivalency between the nodes

ownserver01 [SHCL1DR1]$ ssh ownserver01 date
ssh_exchange_identification: Connection closed by remote host
ownserver01 [SHCL1DR1]$
mkdir ~/.ssh
chmod 700 ~/.ssh
/usr/bin/ssh-keygen -t rsa
usr/bin/ssh-keygen -t dsa
touch ~/.ssh/authorized_keys
ssh ownserver01 cat /home/oracle/.ssh/id_rsa.pub >>authorized_keys
Passowrd: *****
ssh ownserver01 cat /home/oracle/.ssh/id_dsa.pub >>authorized_keys
Password: *****
ssh ownserver02 cat /home/oracle/.ssh/id_rsa.pub >>authorized_keys
Password: *****
ssh ownserver02 cat /home/oracle/.ssh/id_dsa.pub>>authorized_keys
Password:*****
From Ownserver01
ownserver01 [SHCL1DR1]$ ssh OWNSERVER01 date
ownserver01 [SHCL1DR1]$ ssh OWNSERVER02 date
From Ownserver02
ownserver02 [SHCL1DR2]$ ssh OWNSERVER01 date
ownserver02 [SHCL1DR2]$ ssh OWNSERVER02 date
The time on both nodes were same at any time.
I made sure that from Ownserver01 i could SSH to ownserver02 and also to ownserver01 itself and the same from Ownserver02 too.

Note: also mv /etc/issue to /etc/issue.bak or user equivalence will fail – This clears the login banner. (this may be for Solaris only)

Checked for file /usr/lib/libdce.so [ Metalink Note Note:333348.1]

The 10gR2 installer on Soalris 64 bit fails if the file /usr/lib/libdce.so is present.
Check Metalink Note Note:333348.1 for the workaround.

Configure the .profile of user ORACLE

stty cs8 -istrip -parenb
PATH=/usr/bin:/usr/local/bin
EDITOR=/usr/bin/vi
#umask 077
umask 022
ulimit -c 0
export PATH EDITOR
set -o vi
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1
export OH=$ORACLE_HOME
export ORA_CRS_HOME=$ORACLE_BASE/product/crs
export CH=$ORA_CRS_HOME
export ORACLE_SID=SHCL1DR1
#export NLS_LANG=Japanese_Japan.UTF8
export NLS_LANG=AMERICAN_AMERICA.UTF8
export PATH=$ORACLE_HOME/bin:$ORA_CRS_HOME/bin:/sbin:/usr/bin:/usr/ccs/bin:/usr/ucb:/etc:/usr/X/bin:/usr/openwin/bin:/usr/local/bin:/usr/sbin
export PS1=`hostname`” [$ORACLE_SID]\$ “
####
alias bdump=’cd /u01/app/oracle/admin/SHCL1/bdump/’
alias talert=’tail -f $ORACLE_BASE/admin/SHCL1/bdump/alert_$ORACLE_SID.log’
alias tns=’cd $ORACLE_HOME/network/admin’
alias udump=’cd /u01/app/oracle/admin/SHCL1/bdump/’
alias valert=’view $ORACLE_BASE/admin/SHCL1/bdump/alert_$ORACLE_SID.log’
alias home=’cd $ORACLE_HOME’

Created the directories on both nodes

For ORACLE_BASE

mkdir -p /u01/app/oracle
chown -R ORACLE:oinstall /u01/app/oracle
chmod -R 770 /u01/app/oracle

For ORA_CRS_HOME

mkdir -p /u01/app/oracle/product/crs
chown -R root:oinstall /u01/app/oracle/product/crs

For ORACLE_HOME [RDBMS]

mkdir -p /u01/app/oracle/product/10.2.0/db_1
chown -R ORACLE:oinstall /u01/app/oracle/product/10.2.0/db_1

For OCR and Voting disks

mkdir -p /u02/oracle/crs/
mkdir -p /u03/oracle/crs/
mkdir -p /u04/oracle/crs/
Check the privileges on the directories [should be ORACLE:oinstall]

Created OCR and Voting Disk files

In Linux, the oracle provided cluster file system OCSF2 is used on the shared disk,so ‘touch’ ing ocr_disk1 and vote_disk1would do.
But since i used NAS as the shared storage (which is mounted on each nodes), i had to create raw files for ocr and voting disk.
OCR

chown root:oinstall /u02/oracle/crs/ocr_disk1
chown root:oinstall /u03/oracle/crs/ocr_disk2
chmod 660 /u02/oracle/crs/ocr_disk1
chmod 660 /u03/oracle/crs/ocr_disk2

VOTING DISK

chown ORACLE:oinstall /u02/oracle/crs/vote_disk1
chown ORACLE:oinstall /u03/oracle/crs/vote_disk2
chown ORACLE:oinstall /u04/oracle/crs/vote_disk3
chmod 660 /u02/oracle/crs/vote_disk1
chmod 660 /u03/oracle/crs/vote_disk2
chmod 660 /u04/oracle/crs/vote_disk3

Downloaded and unzipped Oracle 10.2.0.1 installation files

10gr2_cluster_sol.cpio
10gr2_companion_sol.cpio
10gr2_db_sol.cpio

Run the Cluster Verification Utility available in

10gr2_cluster_sol.cpio
ownserver01 [SHCL1DR1]$ ./runcluvfy.sh stage -pre crsinst -n OWNSERVER01,OWNSERVER02 -verbose
Performing pre-checks for cluster services setup
Checking node reachability…
Check: Node reachability from node “ownserver01″
Destination Node Reachable?
———————————— ————————
OWNSERVER01 yes
OWNSERVER02 yes
Result: Node reachability check passed from node “ownserver01″.
Checking user equivalence…
Check: User equivalence for user “ORACLE”
Node Name Comment
———————————— ————————
OWNSERVER02 passed
OWNSERVER01 passed
Result: User equivalence check passed for user “ORACLE”.
Checking administrative privileges…
Check: Existence of user “ORACLE”
Node Name User Exists Comment
———— ———————— ————————
OWNSERVER02 yes passed
OWNSERVER01 yes passed
Result: User existence check passed for “ORACLE”.
Check: Existence of group “oinstall”
Node Name Status Group ID
———— ———————— ————————
OWNSERVER02 exists 300
OWNSERVER01 exists 300
Result: Group existence check passed for “oinstall”.
Check: Membership of user “ORACLE” in group “oinstall” [as Primary]
Node Name User Exists Group Exists User in Group Primary Comment
—————- ———— ———— ———— ———— ————
OWNSERVER02 yes yes yes yes passed
OWNSERVER01 yes yes yes yes passed
Result: Membership check for user “ORACLE” in group “oinstall” [as Primary] passed.
Administrative privileges check passed.
Checking node connectivity…
Interface information for node “OWNSERVER02″
Interface Name IP Address Subnet
—————————— —————————— —————-
e1000g0 10.47.2.79 10.47.2.64
e1000g0 10.47.2.80 10.47.2.64
e1000g0 10.47.2.80 10.0.0.0
e1000g1 10.47.2.85 10.47.2.64
e1000g1 10.47.2.86 10.47.2.64
e1000g1 10.47.2.86 10.0.0.0
e1000g2 10.47.2.14 10.47.2.0
e1000g2 10.47.2.15 10.47.2.0
e1000g2 10.47.2.15 10.0.0.0
e1000g3 3.208.169.207 3.208.169.192
e1000g3 3.208.169.208 3.208.169.192
e1000g3 3.208.169.208 3.0.0.0
nxge0 10.47.2.16 10.47.2.0
nxge1 3.208.169.209 3.208.169.192
nxge2 10.47.2.81 10.47.2.64
nxge3 10.47.2.87 10.47.2.64
Interface information for node “OWNSERVER01″
Interface Name IP Address Subnet
—————————— —————————— —————-
e1000g0 10.47.2.76 10.47.2.64
e1000g0 10.47.2.77 10.47.2.64
e1000g0 10.47.2.77 10.0.0.0
e1000g1 10.47.2.82 10.47.2.64
e1000g1 10.47.2.83 10.47.2.64
e1000g1 10.47.2.83 10.0.0.0
e1000g2 10.47.2.11 10.47.2.0
e1000g2 10.47.2.12 10.47.2.0
e1000g2 10.47.2.12 10.0.0.0
e1000g3 3.208.169.203 3.208.169.192
e1000g3 3.208.169.204 3.208.169.192
e1000g3 3.208.169.204 3.0.0.0
nxge0 10.47.2.13 10.47.2.0
nxge1 3.208.169.205 3.208.169.192
nxge2 10.47.2.78 10.47.2.64
nxge3 10.47.2.84 10.47.2.64
Check: Node connectivity of subnet “10.47.2.64″
Source Destination Connected?
—————————— —————————— —————-
OWNSERVER02:e1000g0 OWNSERVER02:e1000g0 yes
OWNSERVER02:e1000g0 OWNSERVER02:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER02:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER02:nxge2 yes
OWNSERVER02:e1000g0 OWNSERVER02:nxge3 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER01:nxge2 yes
OWNSERVER02:e1000g0 OWNSERVER01:nxge3 yes
OWNSERVER02:e1000g0 OWNSERVER02:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER02:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER02:nxge2 yes
OWNSERVER02:e1000g0 OWNSERVER02:nxge3 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER01:nxge2 yes
OWNSERVER02:e1000g0 OWNSERVER01:nxge3 yes
OWNSERVER02:e1000g1 OWNSERVER02:e1000g1 yes
OWNSERVER02:e1000g1 OWNSERVER02:nxge2 yes
OWNSERVER02:e1000g1 OWNSERVER02:nxge3 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g1 OWNSERVER01:nxge2 yes
OWNSERVER02:e1000g1 OWNSERVER01:nxge3 yes
OWNSERVER02:e1000g1 OWNSERVER02:nxge2 yes
OWNSERVER02:e1000g1 OWNSERVER02:nxge3 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g1 OWNSERVER01:nxge2 yes
OWNSERVER02:e1000g1 OWNSERVER01:nxge3 yes
OWNSERVER02:nxge2 OWNSERVER02:nxge3 yes
OWNSERVER02:nxge2 OWNSERVER01:e1000g0 yes
OWNSERVER02:nxge2 OWNSERVER01:e1000g0 yes
OWNSERVER02:nxge2 OWNSERVER01:e1000g1 yes
OWNSERVER02:nxge2 OWNSERVER01:e1000g1 yes
OWNSERVER02:nxge2 OWNSERVER01:nxge2 yes
OWNSERVER02:nxge2 OWNSERVER01:nxge3 yes
OWNSERVER02:nxge3 OWNSERVER01:e1000g0 yes
OWNSERVER02:nxge3 OWNSERVER01:e1000g0 yes
OWNSERVER02:nxge3 OWNSERVER01:e1000g1 yes
OWNSERVER02:nxge3 OWNSERVER01:e1000g1 yes
OWNSERVER02:nxge3 OWNSERVER01:nxge2 yes
OWNSERVER02:nxge3 OWNSERVER01:nxge3 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g0 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER01:e1000g0 OWNSERVER01:nxge2 yes
OWNSERVER01:e1000g0 OWNSERVER01:nxge3 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER01:e1000g0 OWNSERVER01:nxge2 yes
OWNSERVER01:e1000g0 OWNSERVER01:nxge3 yes
OWNSERVER01:e1000g1 OWNSERVER01:e1000g1 yes
OWNSERVER01:e1000g1 OWNSERVER01:nxge2 yes
OWNSERVER01:e1000g1 OWNSERVER01:nxge3 yes
OWNSERVER01:e1000g1 OWNSERVER01:nxge2 yes
OWNSERVER01:e1000g1 OWNSERVER01:nxge3 yes
OWNSERVER01:nxge2 OWNSERVER01:nxge3 yes
Result: Node connectivity check passed for subnet “10.47.2.64″ with node(s) OWNSERVER02,OWNSERVER01.
Check: Node connectivity of subnet “10.0.0.0″
Source Destination Connected?
—————————— —————————— —————-
OWNSERVER02:e1000g0 OWNSERVER02:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER02:e1000g2 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g0 OWNSERVER01:e1000g2 yes
OWNSERVER02:e1000g1 OWNSERVER02:e1000g2 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g1 OWNSERVER01:e1000g2 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g0 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g1 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g2 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g1 yes
OWNSERVER01:e1000g0 OWNSERVER01:e1000g2 yes
OWNSERVER01:e1000g1 OWNSERVER01:e1000g2 yes
Result: Node connectivity check passed for subnet “10.0.0.0″ with node(s) OWNSERVER02,OWNSERVER01.
Check: Node connectivity of subnet “10.47.2.0″
Source Destination Connected?
—————————— —————————— —————-
OWNSERVER02:e1000g2 OWNSERVER02:e1000g2 yes
OWNSERVER02:e1000g2 OWNSERVER02:nxge0 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g2 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g2 yes
OWNSERVER02:e1000g2 OWNSERVER01:nxge0 yes
OWNSERVER02:e1000g2 OWNSERVER02:nxge0 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g2 yes
OWNSERVER02:e1000g2 OWNSERVER01:e1000g2 yes
OWNSERVER02:e1000g2 OWNSERVER01:nxge0 yes
OWNSERVER02:nxge0 OWNSERVER01:e1000g2 yes
OWNSERVER02:nxge0 OWNSERVER01:e1000g2 yes
OWNSERVER02:nxge0 OWNSERVER01:nxge0 yes
OWNSERVER01:e1000g2 OWNSERVER01:e1000g2 yes
OWNSERVER01:e1000g2 OWNSERVER01:nxge0 yes
OWNSERVER01:e1000g2 OWNSERVER01:nxge0 yes
Result: Node connectivity check passed for subnet “10.47.2.0″ with node(s) OWNSERVER02,OWNSERVER01.
Check: Node connectivity of subnet “3.208.169.192″
Source Destination Connected?
—————————— —————————— —————-
OWNSERVER02:e1000g3 OWNSERVER02:e1000g3 yes
OWNSERVER02:e1000g3 OWNSERVER02:nxge1 yes
OWNSERVER02:e1000g3 OWNSERVER01:e1000g3 yes
OWNSERVER02:e1000g3 OWNSERVER01:e1000g3 yes
OWNSERVER02:e1000g3 OWNSERVER01:nxge1 yes
OWNSERVER02:e1000g3 OWNSERVER02:nxge1 yes
OWNSERVER02:e1000g3 OWNSERVER01:e1000g3 yes
OWNSERVER02:e1000g3 OWNSERVER01:e1000g3 yes
OWNSERVER02:e1000g3 OWNSERVER01:nxge1 yes
OWNSERVER02:nxge1 OWNSERVER01:e1000g3 yes
OWNSERVER02:nxge1 OWNSERVER01:e1000g3 yes
OWNSERVER02:nxge1 OWNSERVER01:nxge1 yes
OWNSERVER01:e1000g3 OWNSERVER01:e1000g3 yes
OWNSERVER01:e1000g3 OWNSERVER01:nxge1 yes
OWNSERVER01:e1000g3 OWNSERVER01:nxge1 yes
Result: Node connectivity check passed for subnet “3.208.169.192″ with node(s) OWNSERVER02,OWNSERVER01.
Check: Node connectivity of subnet “3.0.0.0″
Source Destination Connected?
—————————— —————————— —————-
OWNSERVER02:e1000g3 OWNSERVER01:e1000g3 yes
Result: Node connectivity check passed for subnet “3.0.0.0″ with node(s) OWNSERVER02,OWNSERVER01.
Suitable interfaces for VIP on subnet “3.208.169.192″:
OWNSERVER02 e1000g3:3.208.169.207 e1000g3:3.208.169.208
OWNSERVER01 e1000g3:3.208.169.203 e1000g3:3.208.169.204
Suitable interfaces for VIP on subnet “3.208.169.192″:
OWNSERVER02 nxge1:3.208.169.209
OWNSERVER01 nxge1:3.208.169.205
I ignored the last messages about the VIP, as i knew there wasn’t any problem.






Click “Add” to add the second node
Public Node Name :ownserver02
Private Node Name:ownserver02ipmp1
Virtual Host Name:ownserverv02







On Ownserver01
bash-3.00# /u01/app/oracle/oraInventory/orainstRoot.sh
Changing permissions of /u01/app/oracle/oraInventory to 770.
Changing groupname of /u01/app/oracle/oraInventory to oinstall.
The execution of the script is complete
bash-3.00# ls -l /u01/app/oracle/product/crs/root.sh
-rwxr-xr-x 1 ORACLE oinstall 105 Apr 30 11:14 /u01/app/oracle/product/crs/root.sh
bash-3.00# /u01/app/oracle/product/crs/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: ownserver01 ownserver01ipmp1 ownserver01
node 2: ownserver02 ownserver02ipmp1 ownserver02
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
Now formatting voting device: /u02/oracle/crs/vote_disk1
Now formatting voting device: /u03/oracle/crs/vote_disk2
Now formatting voting device: /u04/oracle/crs/vote_disk3
Format of 3 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
ownserver01
CSS is inactive on these nodes.
ownserver02
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
bash-3.00# /u01/app/oracle/product/crs/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: ownserver01 ownserver01ipmp1 ownserver01
node 2: ownserver02 ownserver02ipmp1 ownserver02
clscfg: Arguments check out successfully.
NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
ownserver01
ownserver02
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
Creating VIP application resource on (2) nodes…
Creating GSD application resource on (2) nodes…
Creating ONS application resource on (2) nodes…
Starting VIP application resource on (2) nodes…
Starting GSD application resource on (2) nodes…
Starting ONS application resource on (2) nodes…
Done.

Friday, April 25, 2008

Knowing when to Rebuild Indexes

Abstract
Indexes provide a fast and efficient method of retrieving selected data from a table. By pointing to the blocks that contain the selected data, the entire table does not have to read in order to extract the required information. Most indexes in Oracle databases are built using the B-tree data structure. Contrary to some widely accepted beliefs, indexes in Oracle are not self-balancing. After a table experiences a large number of inserts, updates, and deletes, the index can become unbalanced and fragmented and can hinder query performance. Knowing when to rebuild the indexes is a topic of some confusion. This paper hopes to shine some light on the subject.

Where is the index now?

In order to understand what we must do with the index, we must first get an idea of the current state of the index. This can be accomplished by using the ANALYZE INDEX VALIDATE STRUCTURE command. Normally, the ANALYZE INDEX command creates either computed or estimated statistics for the index that can be seen in the DBA_INDEXES view. Analyzing the index changes the optimizer’s execution plans for queries that potentially use that index. This action may produce unintentional side effects, especially if the index has not previously been analyzed. The VALIDATE STRUCTURE command can be safely executed without affecting the optimizer. The VALIDATE STRUCTURE command populates the SYS.INDEX_STATS table only. The SYS.INDEX_STATS table can be accessed with the public synonym INDEX_STATS. The INDEX_STATS table will only hold validation information for one index at a time. You will need to query this table before validating the structure of the next index.

Below is an example of ANALYZE INDEX VALIDATE STRUCTURE and sample output from INDEX_STATS:

SQLWKS> ANALYZE INDEX shopping_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,lf_blks,del_lf_rows,distinct_keys,used_space
2> FROM INDEX_STATS;

NAME HEIGHT LF_ROWS LF_BLKS DEL_LF_ROW DISTINCT_K USED_SPACE------------------------- --------- ---------- ---------- ---------- ---------- ----------SHOPPING_BASKET_PK 2 1 3 1 1 65

1 row selected.
I have the information, now what?

There are two rules of thumb to help determine if the index needs to be rebuilt. If it is determined that the index needs to be rebuilt, this can easily be accomplished by the ALTER INDEX REBUILD command. Although not necessarily recommended, this command could be executed during normal operating hours. Rebuilding the index uses the existing index as a basis. The alternative is to drop and re-create the index. Creating an index uses the base table as its data source that needs to put a lock on the table. The index is also unavailable during creation.

First rule of thumb is if the index has height greater than four, rebuild the index. For most indexes, the height of the index will be quite low, i.e. one or two. I have seen an index on a 3 million-row table that had height three. An index with height greater than four may need to be rebuilt as this might indicate a skewed tree structure. This can lead to unnecessary database block reads of the index. It is helpful to know the data structure for the table and index. Most times, the index height should be two or less, but there are exceptions.

The second rule of thumb is that the deleted leaf rows should be less than 20% of the total number of leaf rows. An excessive number of deleted leaf rows indicates that a high number of deletes or updates have occurred to the index column(s). The index should be rebuilt to better balance the tree. The INDEX_STATS table can be queried to determine if there are excessive deleted leaf rows in relation to the total number of leaf rows. Let’s look at an example:

SQLWKS> ANALYZE INDEX item_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as ratio 2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS DEL_LF_ROW RATIO
------------------------------ ---------- ---------- ---------- ----------ITEM_BASKET_PK 1 235 74 31.4893617

1 row selected.

In this example, the ratio of deleted leaf rows to total leaf rows is clearly above 20%. This is a good candidate for rebuilding. Let’s rebuild the index and examine the results.

SQLWKS> ALTER INDEX item_basket_pk REBUILD;
Statement processed.
SQLWKS> ANALYZE INDEX item_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as ratio
2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS DEL_LF_ROW RATIO ------------------------------ ---------- ---------- ---------- ----------ITEM_BASKET_PK 1 161 0 0

1 row selected.

The index is rebuilt and validated once again. Examining the INDEX_STATS table shows that the 74 deleted leaf rows were dropped from the index. Notice that the total number of leaf rows went from 235 to 161, which is a difference of 74 leaf rows. This index should provide better performance for the application.

A Case Study

To illustrate the ideas in this paper, a test case was created and various queries were executed against a large table before and after rebuilding the primary key index. It should be noted that this test case was created specifically for this paper and results in the field may vary.

The test table only contains two columns, ID and NAME. The table is described below:
REQ SQL> desc analyze;
Name Null? Type
------------------------------- -------- ----
ID NUMBER NAME VARCHAR2(20)

The test table has a primary key, ANALYZE_PK on the ID field. The test table was populated with 1 million rows of data. After populating the table, over 270,000 rows were selectively deleted from the table. This method ensured that the primary key index would have a sufficient number of deleted leaf rows. After populating and deleting rows from the table, the index was validated as below:

REQ SQL> analyze index analyze_pk validate structure;
Index analyzed.
REQ SQL> select name,height,lf_rows,del_lf_rows,del_lf_rows/lf_rows from index_stats;
NAME HEIGHT LF_ROWS DEL_LF_ROWS DEL_LF_ROWS/LF_ROWS------------------------------ ---------- ---------- ----------- -------------------ANALYZE_PK 3 990206 277353 .280096263

It is now clear that the deleted leaf rows comprise approximately 28% of the total leaf rows.

Four different queries were executed against the table. All of these queries will utilize the primary key index as verified by the explain plan. These queries are executed against the table with the current index. The index is then rebuilt. These queries are re-executed and the results are compared. The four queries are:

1. select count(*) from analyze where id between 250000 and 600000;
2. select count(*) from analyze where id in (50000,4321,698754);
3. select count(*) from analyze where id > 500000;
4. select count(*) from analyze where id = 4321;

With the current index, these three queries are executed against the test table. SQL Trace is enabled and the queries are shown in the following excerpt from TKPROF: ********************************************************************************
select count(*) from analyze where id between 250000 and 600000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.47 0.47 0 293 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.47 0.47 0 293 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
138614 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id in (50000,4321,698754)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 9 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.01 0 9 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
0 CONCATENATION
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id > 500000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.11 1.11 0 1611 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.11 1.11 0 1611 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
445545 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id = 4321

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 3 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************

After these results have been obtained, the index is rebuilt. The index is then validated and the results are show below:

REQ SQL> alter index analyze_pk rebuild;
Index altered.
REQ SQL> analyze index analyze_pk validate structure;
Index analyzed.
REQ SQL> select name,height,lf_rows,del_lf_rows,del_lf_rows/lf_rows from index_stats;
NAME HEIGHT LF_ROWS DEL_LF_ROWS DEL_LF_ROWS/LF_ROWS------------------------------ ---------- ---------- ----------- -------------------ANALYZE_PK 3 712853 0 0

It is clear that the deleted leaf rows have been removed from the index. The four queries are run against the table once again. Again, SQL Trace is enabled and the queries are shown in the following excerpt from TKPROF:

********************************************************************************
select count(*) from analyze where id between 250000 and 600000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.23 0.23 0 679 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.23 0.23 0 679 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
138614 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id in (50000,4321,698754)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 9 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 9 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
0 CONCATENATION
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id > 500000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.96 0.96 0 933 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.96 0.96 0 933 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
445545 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id = 4321

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 3 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)

The same queries have now been executed against the test table before and after rebuilding the index. The results are summarized in the following table:


Before index rebuild After index rebuild % Change
CPU Elapsed CPU Elapsed CPU Elapsed
Query 1 0.47 0.47 0.23 0.23 51% 51%
Query 2 0.00 0.01 0.00 0.00 0% 0%
Query 3 1.11 1.11 0.96 0.96 13.5% 13.5%
Query 4 0.00 0.00 0.00 0.00 0% 0%

The results in the above table show that queries 1 and 3 experienced up to 51% improvement in execution time. It should be noted that the queries were executed at least once prior to tracing the queries so that the data would reside in the buffer cache and first time executions of the queries would not skew the results. This fact explains why the CPU and elapsed times for a query are nearly identical. The data was read from the buffer cache. Queries 2 and 4 took minimal time to execute. These queries searched for specific values. Since the data was cached, the CPU and Elapsed times are minimal. Queries 1 and 3 took significantly longer. These queries returned a range of values. This paper does not attempt to generate any findings on the improvement of different types of queries when rebuilding an index. Rather it gives a broader statement that query execution times will improve after rebuilding substandard indexes.

Script to find indexes to rebuild

Below is a sample script that can be run to determine which indexes need to be rebuilt. For those indexes that need to be rebuilt, the ALTER INDEX REBUILD command is dynamically generated as output. The user can tailor the height and percentage of deleted leaf rows by altering the vMaxHeight and vMaxDel variables. The output of this script can be spooled to a file. This file can then be run to rebuild the indexes.

-- validate_idx.sql
-- by Arun Mahendran
-- Database Administrator
-- 26 April 2008
--- This script will check indexes to find candidates for rebuilding.
-- Run this script in SQL*Plus as a user with SELECT ANY TABLE
-- privileges.
---- This script can be used and modified without permission. Run this
-- script at your own risk! The script author is not responsible for
-- any problems that may arise from running this script.
set serveroutput on size 100000
DECLARE
vOwner dba_indexes.owner%TYPE; /* Index Owner
*/ vIdxName dba_indexes.index_name%TYPE; /* Index Name
*/ vAnalyze VARCHAR2(100); /* String of Analyze Stmt
*/ vCursor NUMBER; /* DBMS_SQL cursor
*/ vNumRows INTEGER; /* DBMS_SQL return rows
*/ vHeight index_stats.height%TYPE; /* Height of index tree
*/ vLfRows index_stats.lf_rows%TYPE; /* Index Leaf Rows
*/ vDLfRows index_stats.del_lf_rows%TYPE; /* Deleted Leaf Rows
*/ vDLfPerc NUMBER; /* Del lf Percentage
*/ vMaxHeight NUMBER; /* Max tree height
*/ vMaxDel NUMBER; /* Max del lf percentage
*/
CURSOR cGetIdx IS SELECT owner,index_name
FROM dba_indexes WHERE OWNER NOT LIKE 'SYS%';
BEGIN
/* Define maximums. This section can be customized. */
vMaxHeight := 3;
vMaxDel := 20;

/* For every index, validate structure */
OPEN cGetIdx;
LOOP
FETCH cGetIdx INTO vOwner,vIdxName;
EXIT WHEN cGetIdx%NOTFOUND;
/* Open DBMS_SQL cursor */
vCursor := DBMS_SQL.OPEN_CURSOR;
/* Set up dynamic string to validate structure */
vAnalyze := 'ANALYZE INDEX ' vOwner '.' vIdxName '
VALIDATE STRUCTURE';
DBMS_SQL.PARSE(vCursor,vAnalyze,DBMS_SQL.V7);
vNumRows := DBMS_SQL.EXECUTE(vCursor);
/* Close DBMS_SQL cursor */
DBMS_SQL.CLOSE_CURSOR(vCursor);
/* Does index need rebuilding? */
/* If so, then generate command */
SELECT height,lf_rows,del_lf_rows INTO vHeight,vLfRows,vDLfRows
FROM INDEX_STATS;
IF vDLfRows = 0 THEN
/* handle case where div by zero */
vDLfPerc := 0;
ELSE
vDLfPerc := (vDLfRows / vLfRows) * 100;
END IF;
IF (vHeight > vMaxHeight) OR (vDLfPerc > vMaxDel) THEN DBMS_OUTPUT.PUT_LINE('ALTER INDEX ' vOwner '.'
vIdxName ' REBUILD;');
END IF;
END LOOP;
CLOSE cGetIdx;
END;
/
Summary

One of the areas lacking in Oracle documentation pertains to rebuilding an index. Oracle’s ANALYZE INDEX VALIDATE STRUCTURE command provides a nice way to check an index to see if it is a candidate for rebuilding. This command does not affect the Oracle optimizer’s execution plan for queries that may use the index. The results in the INDEX_STATS are checked after issuing the VALIDATE STRUCTURE command. If an index has excessive height (greater than four) or a high number of deleted leaf rows (over 20% of the total), we rebuild the index.
A test case was studied to show the potential for improvements after rebuilding an index. In one example, the query executed 51% faster after rebuilding the index. While this example is specific to the test case, it does show the potential for improvement after rebuilding an index.
Keywords:

Index Index Rebuild Rebuild Index Online ALTER INDEX REBUILD ANALYZE INDEX VALIDATE STRUCTURE ANALYZE INDEX LF_ROWS DEL_LF_ROW

Thursday, April 17, 2008

Increase in Controlfile Size ???

In Oracle8i and higher, if the MAXDATAFILES limit is reached the controlfile will expand automatically.In Oracle 8i and higher, when you issue CREATE DATABASE or CREATE CONTROLFILE statements, the MAXDATAFILES parameter specifies an initial size of the datafile portion of the control file.

Later, if you add a file whose number exceeds MAXDATAFILES but is less than or equal to the value specified by the DB_FILES initialization parameter, the control file automatically expands to allow the datafile portion to accommodate more files.

We can have DB_FILES set to more than the value of MAXDATAFILES.However note that these DB_FILES parameter value should be within the OS kernel limits.

Refer to Note 144638.1 - Relationship Between Common Init.ora Parameters and Unix Kernel Parameters.

If you are adding datafiles to the database and is within the DB_FILES limit, you will get an error only if control file is unable to allocate more space. Recreating the control file is not required to increase the MAXDATAFILES parameter.It is not good to have a high DB_FILES parameter value ( much higher than required ). Increasing the value of DB_FILES increases the size of the PGA, or Program Global Area, which is allocated for every user process connected to ORACLE.
KeyWords:

CREATE CONTROLFILE CREATE DATABASE controlfile 144638.1 Unix Kernel Parameters OS kernel limits controlfile size

Tuesday, March 18, 2008

RMAN Recovery Methods


RMAN Recovery Methods
When performing a restore operation, it is best to open two telnet sessions, one for SQL commands, and one for RMAN commands. For the rest of this document, RMAN commands will be prefaced with the RMAN> prompt, and SQL commands will be prefaced with the SQL> prompt. A $ indicates that the command is executed from the Unix prompt.


Restoring and Recovering All Datafiles


In this scenario, it is assumed that your control files are still accessible. You have a backup, done for example with backup database plus archivelog;

Your first step is to make sure that the target database is shut down:


$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

ORACLE instance shut down.


Next, you need to start up your target database in mount mode. RMAN cannot restore datafiles unless the database is at least in mount mode, because RMAN needs to be able to access the control file to determine which backup sets are necessary to recover the database. If the control file isn't available, you have to recover it first. Issue the STARTUP MOUNT command shown in the following example to mount the database:


SQL> startup mount;


Oracle instance started.
Database mounted


Since backup set files are created in an RMAN-specific format, you must use RMAN to restore the datafiles. To use RMAN, connect to the target database:


$ rman target / rcvcat rcvcat/rcvcat@oemprod


The remainder of this example shows how to restore all of the datafiles of the target database. When the restore command is executed, RMAN will automatically go to its last good backup set and restore the datafiles to the state they were in when that backup set was created.

When restoring database files, RMAN reads the datafile header and makes the determination as to whether the file needs to be restored. The recovery is done by allocating a channel for I/O and then issuing the RMAN restore database command.


With Oracle9i and above, you don't need to allocate a channel explicitly. Instead, you can use the default channel mode:


RMAN> restore database;

RMAN> recover database;

SQL> alter database open;


For Oracle8i, the ALLOCATE, RESTORE, and RECOVER commands need to be enclosed by the run{} command:


RMAN> run {

allocate channel d1 type disk;

restore database;

recover database;

}

alter database open;


Once the recovery has been completed, execute a complete RMAN backup to establish a new baseline.

Restoring Specific Tablespaces


In this scenario, it is assumed that your control files are still accessible. You have a backup, done for example with backup database plus archivelog;


Take the tablespace that needs recovery offline, restore the tablespace, recover the tablespace, and bring the tablespace online. If you cannot take the tablespace offline, then shutdown abort the database and restore in mount mode.


First try to take the tablespace offline;


$ sqlplus "/ as sysdba"

SQL> alter tablespace tab offline;


If this works, continue with the RMAN recovery:


$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore tablespace tab;

RMAN> recover tablespace tab;

SQL> alter tablespace tab online;


If taking the tablespace offline fails, follow these steps:


$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore tablespace tab;

RMAN> recover tablespace tab;

SQL> alter database open;


Once the recovery has been completed, execute a complete RMAN backup to establish a new baseline.

Restoring Specific Datafiles

In this scenario, it is assumed that your control files are still accessible. You have a backup, done for example with backup database plus archivelog;


Take the datafile that needs recovery offline, restore the datafile, recover the datafile, and bring the datafile online. If you cannot take the datafile offline, then shutdown abort the database and restore in mount mode.


First try to take the datafile offline:


SQL> alter database datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf' offline;

If this works, continue with the RMAN recovery:


$ rman target / rcvcat rcvcat/rcvcat@oemprod


RMAN> restore datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf'

RMAN> recover datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf'
SQL> alter database datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf' online;

If taking the datafile offline fails, follow these steps:


$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf';

RMAN> recover datafile '/u01/oracle/db/AKI1/tab/AKI1_tab.dbf';

SQL> alter database open;


Once the recovery has been completed, execute a complete RMAN backup to establish a new baseline.


Restoring Control Files


In this scenario, it is assumed that your control files are backed up. You have a backup, done for example with backup database plus archivelog;

In an ideal world you'll never use RMAN to restore a control file. But if something catastrophic happens, and you lose all control files, here are the steps for getting them back:


The following examples assume that you are using a catalog. First, here's the simplest Oracle9i syntax for restoring a control file:


$ sqlplus “/ as SYSDBA”

SQL> shutdown abort;

SQL> startup nomount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod


Set the database id (DBID) with the following command. This is a 10-digit number that RMAN uses to uniquely identify this database in the recovery log. The number can be obtained from any previous RMAN backup log file.


RMAN> set dbid = xxxxxxxxxx;
RMAN> restore controlfile;

SQL> alter database mount;

SQL> alter database open;

If this fails with ...

ORA-01589: must use RESETLOGS or NORESETLOGS option for database open


... then you must perform a recover database:
SQL> shutdown abort;
SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> recover database;

SQL> alter database open resetlogs;
RMAN> reset database;

Note, that all offline archivelogs are now useless, perform a full back as soon as possible.


Restoring Online Redologs


In this scenario, it is assumed that your control files are backed up. You have a backup, done for example with backup database plus archivelog;

$ sqlplus "/ as sysdba"
SQL> shutdown abort;

SQL> startup nomount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod


Set the database id (DBID) with the following command. This is a 10-digit number that RMAN uses to uniquely identify this database in the recovery log. The number can be obtained from any previous RMAN backup log file.


RMAN> set dbid = xxxxxxxxxx;

RMAN> restore controlfile;

SQL> alter database mount;

RMAN> restore database;

RMAM> recover database;

RMAN-00571:=========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS=============

RMAN-00571:=========================================================

RMAN-03002: failure of recover command at 09/28/2004 11:03:23

RMAN-06054: media recovery requesting unknown log: thread 1 seq 1 lowscn 8448414


Since the online logs were lost, complete recovery is not possible. Open the database with resetlogs to continue.

RMAN> alter database open resetlogs;


-------------------------------------IMPORTANT-------------------------------------------
During this type of recovery, if you receive error messages like this:


RMAN> restore database;

Starting restore at 11-JUL-05

using channel ORA_SBT_TAPE_1
using channel ORA_DISK_1
RMAN-00571:=========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============
RMAN-00571:========================================================
RMAN-03002: failure of restore command at 07/11/2005 14:25:22
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 10 found to restore
RMAN-06023: no backup or copy of datafile 9 found to restore
RMAN-06023: no backup or copy of datafile 8 found to restore
RMAN-06023: no backup or copy of datafile 7 found to restore
RMAN-06023: no backup or copy of datafile 6 found to restore
RMAN-06023: no backup or copy of datafile 5 found to restore
RMAN-06023: no backup or copy of datafile 4 found to restore
RMAN-06023: no backup or copy of datafile 3 found to restore
RMAN-06023: no backup or copy of datafile 2 found to restore
RMAN-06023: no backup or copy of datafile 1 found to restore

…use the following directions to recover (recreate the controlfile):


1.With the database mounted, execute ‘alter database backup controlfile to trace resetlogs;’


2.Perform a shutdown abort on the database, but remain at the SQL> prompt.


3.In another telnet session, go to the udump directory to retrieve the resulting trace file and copy it to another location to edit it.


4.Edit the file and add the phrase “until cancel” to the recover database command at the end. The phrase should read “recover database until cancel using backup controlfile”. Remove the “alter database open” command after the recover command. Save the file with a .sql extension.


5.Back at the SQL> prompt, execute the modified trace file. When prompted for an archived log, type in “cancel” and the reply should be “media recovery cancelled”.


6.Issue “alter database open resetlogs”. The database should open after a few moments.


7.Connect to the RMAN recovery catalog and issue the “reset database” command.


8.Perform a full RMAN backup as soon as possible.

----------------------------------------------------------------------------------------
Time-Based, Change-Based, or SCN-based Incomplete Recovery


Incomplete recovery uses a backup to produce a non-current version of the database. In other words, youdo not apply all of the redo records generated after the most recent backup.


You usually perform incomplete recovery of the whole database in the following situations:


Media failure destroys some or all of the online redo logs.
A user error causes data loss, for example, a user inadvertently drops a table.
You cannot perform complete recovery because an archived redo log is missing.
You lose your current control file and must use a backup control file to open the database.


To perform incomplete media recovery, you must restore all datafiles from backups created prior to the time to which you want to recover and then open the database with the RESETLOGS option when recovery completes. The RESETLOGS operation creates a new incarnation of the database; in other words, a database with a new stream of log sequence numbers starting with log sequence 1.


NOTE – Start every RMAN incomplete recovery with the following commands:

$ sqlplus "/ as sysdba"

SQL> shutdown abort;

SQL> startup mount;

$ rman target / rcvcat rcvcat/rcvcat@oemprod


--For time-based recovery, use these commands:


RMAN> restore database until time "to_date('07/12/05 12:0:00','MM/DD/YY HH24:MI:SS')RMAN> recover database until time "to_date('07/12/05 12:0:00','MM/DD/YY HH24:MI:SS')

media recovery complete.SQL> alter database open resetlogs;


--For SCN-based recovery, user these commands:
RMAN> restore database until scn 1000;

RMAN> recover database until scn 1000;

media recovery complete.

SQL> alter database open resetlogs;


--For change-based recovery, user these commands:


RMAN> restore database until sequence 9923; --Archived log sequence number
RMAN> recover database until sequence 9923; --Archived log sequence numbermedia recovery complete.

SQL> alter database open resetlogs;


Once the recovery has been completed, execute the following steps:

Delete prior backups with this command (from the RMAN prompt):

RMAN> delete force backup;

This command removes all prior backups from the RMAN catalog as they can no longer be used once the database has been restarted with the resletlogs option. After completing this command, create a new RMAN backup to establish a new baseline.

Recovering Archived Logs only

In the event that you want to recover the database archived redo logs until a desired time, you can use the following commands:

$ rman target / rcvcat rcvcat/rcvcat@oemprod

RMAN> restore ARCHIVELOG FROM TIME 'SYSDATE-1' UNTIL TIME 'SYSDATE';
or
RMAN> restore ARCHIVELOG FROM TIME "to_date('07/11/05 00:00:01','MM/DD/YY HH24:MI:SS')UNTIL TIME 'SYSDATE';



By

-6N


Keywords:

RMAN Recovery Methods Restoring and Recovering All Datafiles Recovering Datafiles ALLOCATE, RESTORE, and RECOVER commands RMAN backup Restoring Specific Tablespaces Restoring Specific Datafiles Restoring Control Files offline archivelogs Restoring Online Redologs Perform a full RMAN backup Incomplete recovery Time-Based, Change-Based, or SCN-based Incomplete Recovery

Wednesday, March 5, 2008

Hot Backup internals

Why do I have to put a tablespace into a special hot backup mode when I perform a backup with operating system 'copy' or 'cp' commands? Why can't I just copy the files?



An operating system doesn't understand how Oracle works internally. So when you ask it to copy a database file, it has no idea of the significance of the internal organisation of the file it's copying, and simply copies whichever pieces of it happen to swing under the disk head from time to time. That is unfortunate for two separate reasons.



First, Oracle uses the first database block within every data file (known as the data file header block) to store that file's checkpoint change number -and it is this number which acts as the 'timestamp' for the entire data file. Since a recovery involves applying redo which was generated after the 'time' of a data file, it's important that the checkpoint change number for a data file faithfully records the earliest age of any part of that data file. We can then roll everything forward from that earliest point, and arrive at a totally consistent file as a whole. But if the checkpoint change number happens to state a time which is later than the actual time of even just one part of the data file's contents, then Oracle will not know to roll that particular piece of the file forward, and it will be 'left behind'. You'll have a corrupt data file on your hands, and an unrecoverable tablespace -and if it's the SYSTEM tablespace we're talking about, you've just lost your entire database.



For example, suppose my data file's checkpoint change number reads '1000', but there are actually blocks of data in that file which are actually from time '900'. During a recovery, Oracle will think it should apply redo from time 1000 onwards (because that's what the data file's header block tells it). But that leaves some blocks very out-of-synch with the rest of the file.



So the data file's header block is crucial in telling us the point from which we need to start rolling forward. But if you are doing a hot backup, then the instance? (specifically, the CKPT background process) may need to update the contents of that block whilst the file is being copied. That would not be a problem if the operating system was smart enough to know that the data file header block should be the first block copied... but, because it doesn't understand the internal organisation of an Oracle datafile, it doesn't know that. It may therefore copy that block after CKPT has updated it... and the backup file is therefore immediately rendered useless for recovery purposes.



Now that is our first problem with just copying hot data files: parts of a data file being out of synchronisation with each other, and with an unreliable checkpoint change number in the data file header, mean the data file as a whole is unusable. But the same sort of problem also arises within an Oracle block itself. We tend to think of an Oracle block as a single entity -but it isn't. Physically, an Oracle block is made up of multiple operating system blocks: for example, an 8K database block is usually comprised of 16 separate O/S blocks (because most O/S blocks are 512 bytes in size). For an Oracle block to be readable by an instance, all 16 of those O/S blocks must be synchronised, or from the same 'database time'. Unfortunately, if it's an operating system utility which is copying the database blocks, that utility has no idea that all 16 O/S blocks need to be kept synchronised: it just copies whichever of them happen to turn up under the disk head at any one moment.



But if people are actually modifying the contents of an Oracle block during the copy operation (which is definitely possible during a hot backup, of course), then you could conceivably end up with a copy of what we think of as a single Oracle block that is actually from 16 different database times. Such a block copy is termed a fractured block, and it's unusable in a recovery scenario.



So if you just copy your datafiles, whilst they're in use, you will end up with inconsistent block images, and a datafile header block that does not correctly describe the time of the entire datafile. Such a copy is useless.



OK, but what precisely does the begin backup command do to prevent these problems arising?



Two separate things. To deal with the data file header block not accurately describing the earliest time of the contents of the data file, the command causes the header block to be locked from all further updates by the CKPT process. If the data file header block is locked, so that it records the database time from just before it started to be copied, it must tell a valid point from which to start applying redo in a recovery scenario.



For example, if I lock the header block when it contains a checkpoint change number of '1000'; and if I then take half an hour to copy the entire file; then there are bound to be blocks within the resulting backup file from time 1100, 1200, 1400 and so on. But there will not be blocks from a time before 1000, and therefore no bits of the file would get 'forgotten' during a recovery.



Second: to deal with the fractured block issue, the begin backup command sets a flag which tells Oracle that the first piece of DML to modify any part of a block whilst its file is being copied will cause the entire block to be written to the redo logs. This ensures that whilst the image of the block in the backup file is fractured, and hence useless, there is a clean image of the block in the redo stream -which we can obviously use to replace the fractured version during a recovery operation.



I think I understand. Can you show me these two things happening?



Sure. The locked datafile header block is easiest to demonstrate. First, I issue this piece of SQL:





SELECT T.NAME, H.FILE#, H.CHECKPOINT_CHANGE#

FROM V$TABLESPACE T, V$DATAFILE_HEADER H

WHERE T.TS#=H.TS#;



NAME FILE# CHECKPOINT_CHANGE#

SYSTEM 1 121923

UNDOTBS 2 121923

INDX 3 121923

TOOLS 4 121923

USERS 5 121923



So, all tablespaces (and all data files) start off synchronised with the same checkpoint change number. Now let's go through the motion of beginning a hot backup and see what happens:



alter tablespace users begin backup;



select t.name, h.file#, h.checkpoint_change#

from v$tablespace t, v$datafile_header h

where t.ts#=h.ts#;





NAME FILE# CHECKPOINT_CHANGE

SYSTEM 1 121923

UNDOTBS 2 121923

INDX 3 121923

TOOLS 4 121923

USERS 5 121925





Do you notice how the USERS tablespace is now actually slightly ahead of the time the rest of the database has gotten to? That's because, as a tablespace is put into hot backup, any dirty buffers which belong to that tablespace are flushed by DBWR from the buffer cache back down to disk. CKPT then updates the time in that file's header block. In short, putting a tablespace into hot backup causes that tablespace (alone) to be checkpointed.





That datafile doesn't stay ahead of the rest of the database for long, however. If we simulate some transactional activity on the database, we can check the timestamps again:



alter system switch logfile;

alter system switch logfile;

alter system switch logfile;

alter system switch logfile;



At each log switch (which would normally be caused by users continuing to perform transactions, of course), a system-wide checkpoint is issued. So if we now check the data file headers again, we should see some changes:



select t.name, h.file#, h.checkpoint_change#

from v$tablespace t, v$datafile_header h

where t.ts#=h.ts#;



NAME FILE# CHECKPOINT_CHANGE

SYSTEM 1 121939

UNDOTBS 2 121939

INDX 3 121939

TOOLS 4 121939

USERS 5 121925





You might need to look at those numbers carefully to spot the effect, but if you do, you'll see that file 5 is no longer ahead of the pack. In fact, it's still stuck very firmly at timestamp 121925, whilst the other files have moved on to later times. Of course, the checkpoint change number reported here for file 5 is a complete lie in a sense: the contents of file 5 will be just as much at time 121939 as the rest of the database. But the 'begin backup' has locked the header so that it reports the time the file started being backed up, not the time of its contents.



OK. The data file header issue I can see. What about the block-sized redo thing?





That one's a bit harder to demonstrate. We'll first have to run an Oracle script that makes it possible to see how much redo a transaction generates. Then we can perform the same transaction when the data file is not being backed up and compare it with when it is.





First, I connect to my database as a normal user -in this case, as Scott. I then run the Oracle-supplied UTLXPLAN.SQL script. And as SYS, I then grant Scott the DBA role as a quick-fix way of allowing Scott to make use of the redo-measuring features that the UTLXPLAN script makes possible:





sqlplus scott/tiger

@%ORACLE_HOME%\rdbms\admin\utlxplan.sql



connect / as sysdba

grant dba to scott;

connect scott/tiger



Now we can test the amount of redo a transaction generates normally:





set autotrace trace stat



update emp set sal=900 where ename='MILLER';



Statistics

339 recursive calls

3 db block gets

65 consistent gets

5 physical reads

188 redo size

631 bytes sent via SQL*Net to client

548 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

8 sorts (memory)

0 sorts (disk)

1 rows processed





The set autotrace trace stat command means we get to see a short report about what went on inside Oracle to process the update command I issued. As you can see, the update of Miller's salary generated 188 bytes of redo. That's what this particular update generates “normally”. Now, let me put the tablespace where the EMP table lives into hot backup and issue exactly the same update:



alter tablespace users begin backup;



update emp set sal=900 where ename='MILLER';



Statistics

0 recursive calls

2 db block gets

3 consistent gets

0 physical reads

8740 redo size

632 bytes sent via SQL*Net to client

548 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

1 rows processed



Suddenly, the exact same transaction now generates 8,740 bytes of redo. That's an extra 8,552 bytes, or a 4500% increase! The reason is not hard to find:

Keywords:

Hot Backup internals special hot backup mode suppose my data file's checkpoint CKPT has updated it oracle hot backup Oracle hotbackup Internals of Oracle hotbackup Internals of Oracle hot backup how hot backup works what is happening in hot backup begin backup alter tablespace set autotrace trace stat system-wide checkpoint

Oracle Memory Tuning

Redo Log Buffer Latches
Database Buffer Cache Size
Shared Pool Size

Redo Log Buffer Latches

When a transaction is ready to write its changes to the redo log, it first has to grab the Redo Allocation Latch, of which there is only one, to keep others from writing to the log at the same time. If someone else has that latch, it has to wait for the latch, resulting in a "miss".

Once it grabs that latch, if the change is larger than log_small_entry_max_size bytes and if your server has multiple CPU's, it then tries to grab a Redo Copy Latch, of which there can be up to 2 times the number of CPU's, which would allow it to release the Redo Allocation Latch for someone else to use. If none of them are available, resulting in an "immediate miss", it will not wait for a Redo Copy Latch (thus, the "immediate"), but, instead, hangs on to the Redo Allocation Latch until the change is written.

Oracle keeps statistics for these latches in v$latch, including the number of gets and misses for the Redo Allocation Latch and the number of immediate gets and immediate misses for the Redo Copy Latches, which are cumulative values since instance startup. If you've got a 100% hit ratio for either of those latch types, that's a good thing. It just means that all of your transactions were able to grab and use the latch without retrying. It's when you get below a 99% hit ratio that you need to start looking out. The following sql figures the current hit ratios for those latches:


column latch_name format a20
select name latch_name, gets, misses,
round(decode(gets-misses,0,1,gets-misses)/
decode(gets,0,1,gets),3) hit_ratio
from v$latch where name = 'redo allocation';

column latch_name format a20
select name latch_name, immediate_gets, immediate_misses,
round(decode(immediate_gets-immediate_misses,0,1,
immediate_gets-immediate_misses)/
decode(immediate_gets,0,1,immediate_gets),3) hit_ratio
from v$latch where name = 'redo copy';

If your Redo Allocation Latch hit ratio consistently falls below 99%, and if you have a multi-CPU machine, you can lower the value for log_small_entry_max_size (see below) in your init.ora file (ours is currently 800 bytes, but, maybe 100 or so bytes may be better - you'll have to try out different values over time), which says that any change smaller than that will hang onto the Redo Allocation Latch until Oracle is finished writing that change. Anything larger than that grabs a Redo Copy Latch, if currently available, and releases the Redo Allocation Latch for another transaction to use.

If your Redo Copy Latch hit ratio consistently falls below 99%, and if you have a multi-CPU machine, you can raise the value of log_simultaneous_copies in your init.ora file up to twice the number of CPU's to provide more Redo Copy Latches (there is only one Redo Allocation Latch, so it is at a premium). Remember that you have to shut down your database instance and restart it to reread the new parameter values in the init.ora file ($ORACLE_HOME/dbs/initSID.ora). The following sql shows the current values for those associated parameters:

column name format a30
column value format a10
select name,value from v$parameter where name in ('log_small_entry_max_size','log_simultaneous_copies',
'cpu_count');


Database Buffer Cache Size


The Database Buffer Cache is part of the Shared Global Area (SGA) in memory for a single database instance (SID) and holds the blocks of data and indexes that you and everyone else is currently using. It may even contain multiple copies of the same data block if, for example, more than one transaction is making changes to it but not yet committed, or, if you are looking at the original copy (select) and someone else is looking at their modified but uncommitted copy (insert, update, or delete). The parameters db_block_buffers and db_block_size in your init.ora file determine the size of the buffer cache. db_block_size, in bytes, is set at database creation, and cannot be changed (unless you recreate the database from scratch), so, the only thing that you can adjust is the number of blocks in db_block_buffers (one buffer holds one block).
The Database Buffer Cache is part of the Shared Global Area (SGA) in memory for a single database instance (SID) and holds the blocks of data and indexes that you and everyone else is currently using. It may even contain multiple copies of the same data block if, for example, more than one transaction is making changes to it but not yet committed, or, if you are looking at the original copy (select) and someone else is looking at their modified but uncommitted copy (insert, update, or delete). The parameters db_block_buffers and db_block_size in your init.ora file determine the size of the buffer cache. db_block_size, in bytes, is set at database creation, and cannot be changed (unless you recreate the database from scratch), so, the only thing that you can adjust is the number of blocks in db_block_buffers (one buffer holds one block).

The Cache Hit Ratio shows how many blocks were already in memory (logical reads, which include "db block gets" for blocks you are using and "consistent gets" of original blocks from rollback segments that others are updating) versus how many blocks had to be read from disk ("physical reads"). Oracle recommends that this ratio be at least 80%, but, I like at least 90% myself. The ratio can be obtained from values in v$sysstat, which are constantly being updated and show statistics since database startup (it is only accessable from a DBA user account). You will get a more representative sample if the database has been running several hours with normal user transactions taking place. The Cache Hit Ratio is determined as follows:

select (1-(pr.value/(dbg.value+cg.value)))*100
from v$sysstat pr, v$sysstat dbg, v$sysstat cg
where pr.name = 'physical reads'
and dbg.name = 'db block gets'
and cg.name = 'consistent gets';

If you have a low Cache Hit Ratio, you can test to see what the effect of adding buffers would be by putting "db_block_lru_extended_statistics = 1000" in the init.ora file, doing a shutdown and startup of the database, and waiting a few hours to get a representative sample. Oracle determines how many Additional Cache Hits (ACH) would occur for each query and transaction for each of the 1000 buffer increments (or whatever other maximum value you might want to try out), and places them into the x$kcbrbh table, which is only accessable from user "sys". To measure the new Cache Hit Ratio with, for example, 100 extra buffers, determine ACH as follows:

select sum(count) "ACH" from x$kcbrbh where indx < 100; and plug that value into the Cache Hit Ratio formula as follows: select (1-((pr.value-&ACH)/(dbg.value+cg.value)))*100 from v$sysstat pr, v$sysstat dbg, v$sysstat cg where pr.name = 'physical reads' and dbg.name = 'db block gets' and cg.name = 'consistent gets'; If the ratio originally was lower than 80% and is now higher with ACH, you may want to increase db_block_buffers by that number of extra buffers, restarting your database to put the increase into effect. Be sure to try several values for the number of extra buffers to find an optimum for your work load. Also, remove db_block_lru_extended_statistics from your init.ora file before restarting your database to stop gathering statistics, which tends to slow down the transaction time. (Removing that clears the x$kcbrbh table.) Also, make sure that your server has enough memory to accomodate the increase! If you are running really tight on memory, and the Cache Hit Ratio is running well above 80%, you might want to check the effect of lowering the number of buffers, which would release Oracle memory that could then be used by other processes, but would also potentially slow down database transactions. To test this, put "db_block_lru_statistics = true" in your init.ora file and restart your database. This gathers statistics for Additional Cache Misses (ACM) that would occur for each query and transaction for each of the buffer decrements up to the current db_block_buffers value, placing them into the x$kcbcbh table, also only accessable from user "sys". To measure the new Cache Hit Ratio with, for example, 100 fewer buffers, determine ACM as follows: select sum(count) "ACM" from x$kcbcbh where indx >= (select max(indx)+1-100 from x$kcbcbh);

and plug that value into the Cache Hit Ratio formula as follows:

select (1-((pr.value+&ACM)/(dbg.value+cg.value)))*100
from v$sysstat pr, v$sysstat dbg, v$sysstat cg
where pr.name = 'physical reads'
and dbg.name = 'db block gets'
and cg.name = 'consistent gets';

If the ratio is still above 80%, you may want to decrease db_block_buffers by that number of fewer buffers, restarting your database to put the decrease into effect. Be sure to try several values for the number of fewer buffers to find an optimum for your work load. Also, remove db_block_lru_statistics from your init.ora file before restarting your database to stop gathering statistics, which tends to slow down the transaction time. (Removing that clears the x$kcbcbh table.)

I have three scripts which you can use to figure your instance's optimum number of db_block_buffers. And computes the current ratio for the database buffer cache, and can be run from any DBA account. And computes the resulting ratio for an increase in the buffer cache size of the given number of buffer blocks (figuring ACH itself). It must be run from user "sys", after a representative sampling time with db_block_lru_extended_statistics in place. . It must be run from user "sys", after a representative sampling time with db_block_lru_statistics in place.

Shared Pool Size


The Shared Pool is also part of the Shared Global Area (SGA) in memory for a single database instance (SID) and holds the Library Cache with the most recently used SQL statements and parse trees along with PL/SQL blocks, and the Data Dictionary Cache with definitions of tables, views, and other dictionary objects. Both of those sets of cached objects can be used by one or more users, and are aged out (Least Recently Used) as other objects need the space. (You can pin large frequently-used objects in the Shared Pool for performance and other reasons, but, I won't go into that here.)

There are several ratios that you can check after a representative sample time that may indicate that you need to enlarge the shared pool, which is set by the shared_pool_size parameter in your init.ora file and defaults to 3500000 (3.5 Meg). One indicator is the Library Cache Get Hit Ratio, which shows how many cursors are being shared (SQL statements (gets) which were already found and parsed (gethits) in the shared pool, with no parsing or re-parsing needed), and is determined by:

select gethits,gets,gethitratio from v$librarycache
where namespace = 'SQL AREA';

If the gethitratio is less than 90%, you should consider increasing the shared pool size. Another indicator is the reloads per pin ratio, which shows how many parsed statements (pins) have been aged out (reloaded) of the shared pool for lack of space (ideally 0), and is determined by:

select reloads,pins,reloads/pins from v$librarycache
where namespace = 'SQL AREA';

If the reloads/pins ratio is more than 1%, you should consider increasing the shared pool size. A third indicator, which is not as important as the first two, is the dictionary object getmisses per get ratio, which shows how many cached dictionary object definitions in the dictionary cache are encountering too many misses (aged out?), and is determined by:

select sum(getmisses),sum(gets),sum(getmisses)/sum(gets)
from v$rowcache;

If the getmisses/gets ratio is more than 15%, you should consider increasing the shared pool size.

If these ratios indicate that your shared pool is too small, you can estimate the size of the shared pool by doing the following. Set the shared_pool_size to a very large number, maybe a fourth or more of your system's available memory, depending on how many other instances and processes that you have running that are also using memory, then shutdown and startup your database and let it run for a representative time (like all day or when a large batch job is running that you want to accomodate), then, figure the memory required for packages and views, memory required for frequently used SQL statements, and memory required for users SQL statements executed, as shown below:

select sum(sharable_mem) "Packages/Views" from v$db_object_cache;
select sum(sharable_mem) "SQL Statements" from v$sqlarea
where executions > 5;select sum(250 * users_opening) "SQL Users" from v$sqlarea;

Then, add the above three numbers and multiply the results by 2.5. Use this estimated size as a guideline for the value for shared_pool_size, changing that parameter to the estimated size or back to the original size and doing another shutdown/startup to put the value into effect.

select sum(a.spspv) "Packages/Views", sum(a.spssql) "SQL
Statements",
sum(a.spsusr) "SQL Users", round((sum(a.spspv) + sum(a.spssql) +
sum(a.spsusr)) * 2.5,-6) "Estimated shared_pool_size"
from (select sum(sharable_mem) spspv, 0 spssql, 0 spsusr
from v$db_object_cache
union all
select 0, sum(sharable_mem), 0 from v$sqlarea
where executions > 5
union all
select 0, 0, sum(250 * users_opening) from v$sqlarea) a;

Keywords:

Oracle Memory Tuning Redo Log Buffer Latches Database Buffer Cache Size Shared Pool Size Redo Allocation Latch v$latch 99% hit ratio Shared Global Area Database Buffer Cache db_block_size physical reads v$sysstat db block gets db_block_lru_statistics shared_pool_size librarycache pins cached dictionary object definitions shared pool advisery