Friday, April 25, 2008

Knowing when to Rebuild Indexes

Abstract
Indexes provide a fast and efficient method of retrieving selected data from a table. By pointing to the blocks that contain the selected data, the entire table does not have to read in order to extract the required information. Most indexes in Oracle databases are built using the B-tree data structure. Contrary to some widely accepted beliefs, indexes in Oracle are not self-balancing. After a table experiences a large number of inserts, updates, and deletes, the index can become unbalanced and fragmented and can hinder query performance. Knowing when to rebuild the indexes is a topic of some confusion. This paper hopes to shine some light on the subject.

Where is the index now?

In order to understand what we must do with the index, we must first get an idea of the current state of the index. This can be accomplished by using the ANALYZE INDEX VALIDATE STRUCTURE command. Normally, the ANALYZE INDEX command creates either computed or estimated statistics for the index that can be seen in the DBA_INDEXES view. Analyzing the index changes the optimizer’s execution plans for queries that potentially use that index. This action may produce unintentional side effects, especially if the index has not previously been analyzed. The VALIDATE STRUCTURE command can be safely executed without affecting the optimizer. The VALIDATE STRUCTURE command populates the SYS.INDEX_STATS table only. The SYS.INDEX_STATS table can be accessed with the public synonym INDEX_STATS. The INDEX_STATS table will only hold validation information for one index at a time. You will need to query this table before validating the structure of the next index.

Below is an example of ANALYZE INDEX VALIDATE STRUCTURE and sample output from INDEX_STATS:

SQLWKS> ANALYZE INDEX shopping_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,lf_blks,del_lf_rows,distinct_keys,used_space
2> FROM INDEX_STATS;

NAME HEIGHT LF_ROWS LF_BLKS DEL_LF_ROW DISTINCT_K USED_SPACE------------------------- --------- ---------- ---------- ---------- ---------- ----------SHOPPING_BASKET_PK 2 1 3 1 1 65

1 row selected.
I have the information, now what?

There are two rules of thumb to help determine if the index needs to be rebuilt. If it is determined that the index needs to be rebuilt, this can easily be accomplished by the ALTER INDEX REBUILD command. Although not necessarily recommended, this command could be executed during normal operating hours. Rebuilding the index uses the existing index as a basis. The alternative is to drop and re-create the index. Creating an index uses the base table as its data source that needs to put a lock on the table. The index is also unavailable during creation.

First rule of thumb is if the index has height greater than four, rebuild the index. For most indexes, the height of the index will be quite low, i.e. one or two. I have seen an index on a 3 million-row table that had height three. An index with height greater than four may need to be rebuilt as this might indicate a skewed tree structure. This can lead to unnecessary database block reads of the index. It is helpful to know the data structure for the table and index. Most times, the index height should be two or less, but there are exceptions.

The second rule of thumb is that the deleted leaf rows should be less than 20% of the total number of leaf rows. An excessive number of deleted leaf rows indicates that a high number of deletes or updates have occurred to the index column(s). The index should be rebuilt to better balance the tree. The INDEX_STATS table can be queried to determine if there are excessive deleted leaf rows in relation to the total number of leaf rows. Let’s look at an example:

SQLWKS> ANALYZE INDEX item_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as ratio 2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS DEL_LF_ROW RATIO
------------------------------ ---------- ---------- ---------- ----------ITEM_BASKET_PK 1 235 74 31.4893617

1 row selected.

In this example, the ratio of deleted leaf rows to total leaf rows is clearly above 20%. This is a good candidate for rebuilding. Let’s rebuild the index and examine the results.

SQLWKS> ALTER INDEX item_basket_pk REBUILD;
Statement processed.
SQLWKS> ANALYZE INDEX item_basket_pk VALIDATE STRUCTURE;
Statement processed.
SQLWKS> SELECT name,height,lf_rows,del_lf_rows,(del_lf_rows/lf_rows)*100 as ratio
2> FROM INDEX_STATS;
NAME HEIGHT LF_ROWS DEL_LF_ROW RATIO ------------------------------ ---------- ---------- ---------- ----------ITEM_BASKET_PK 1 161 0 0

1 row selected.

The index is rebuilt and validated once again. Examining the INDEX_STATS table shows that the 74 deleted leaf rows were dropped from the index. Notice that the total number of leaf rows went from 235 to 161, which is a difference of 74 leaf rows. This index should provide better performance for the application.

A Case Study

To illustrate the ideas in this paper, a test case was created and various queries were executed against a large table before and after rebuilding the primary key index. It should be noted that this test case was created specifically for this paper and results in the field may vary.

The test table only contains two columns, ID and NAME. The table is described below:
REQ SQL> desc analyze;
Name Null? Type
------------------------------- -------- ----
ID NUMBER NAME VARCHAR2(20)

The test table has a primary key, ANALYZE_PK on the ID field. The test table was populated with 1 million rows of data. After populating the table, over 270,000 rows were selectively deleted from the table. This method ensured that the primary key index would have a sufficient number of deleted leaf rows. After populating and deleting rows from the table, the index was validated as below:

REQ SQL> analyze index analyze_pk validate structure;
Index analyzed.
REQ SQL> select name,height,lf_rows,del_lf_rows,del_lf_rows/lf_rows from index_stats;
NAME HEIGHT LF_ROWS DEL_LF_ROWS DEL_LF_ROWS/LF_ROWS------------------------------ ---------- ---------- ----------- -------------------ANALYZE_PK 3 990206 277353 .280096263

It is now clear that the deleted leaf rows comprise approximately 28% of the total leaf rows.

Four different queries were executed against the table. All of these queries will utilize the primary key index as verified by the explain plan. These queries are executed against the table with the current index. The index is then rebuilt. These queries are re-executed and the results are compared. The four queries are:

1. select count(*) from analyze where id between 250000 and 600000;
2. select count(*) from analyze where id in (50000,4321,698754);
3. select count(*) from analyze where id > 500000;
4. select count(*) from analyze where id = 4321;

With the current index, these three queries are executed against the test table. SQL Trace is enabled and the queries are shown in the following excerpt from TKPROF: ********************************************************************************
select count(*) from analyze where id between 250000 and 600000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.47 0.47 0 293 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.47 0.47 0 293 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
138614 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id in (50000,4321,698754)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 9 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.01 0 9 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
0 CONCATENATION
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id > 500000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.11 1.11 0 1611 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.11 1.11 0 1611 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
445545 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id = 4321

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 3 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************

After these results have been obtained, the index is rebuilt. The index is then validated and the results are show below:

REQ SQL> alter index analyze_pk rebuild;
Index altered.
REQ SQL> analyze index analyze_pk validate structure;
Index analyzed.
REQ SQL> select name,height,lf_rows,del_lf_rows,del_lf_rows/lf_rows from index_stats;
NAME HEIGHT LF_ROWS DEL_LF_ROWS DEL_LF_ROWS/LF_ROWS------------------------------ ---------- ---------- ----------- -------------------ANALYZE_PK 3 712853 0 0

It is clear that the deleted leaf rows have been removed from the index. The four queries are run against the table once again. Again, SQL Trace is enabled and the queries are shown in the following excerpt from TKPROF:

********************************************************************************
select count(*) from analyze where id between 250000 and 600000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.23 0.23 0 679 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.23 0.23 0 679 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
138614 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id in (50000,4321,698754)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 9 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 9 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
0 CONCATENATION
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id > 500000

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.96 0.96 0 933 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.96 0.96 0 933 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
445545 INDEX (RANGE SCAN) OF 'ANALYZE_PK' (UNIQUE)
********************************************************************************
select count(*) from analyze where id = 4321

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 3 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSEParsing user id: 5 (SYSTEM)

Rows Execution Plan------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (AGGREGATE)
1 INDEX (UNIQUE SCAN) OF 'ANALYZE_PK' (UNIQUE)

The same queries have now been executed against the test table before and after rebuilding the index. The results are summarized in the following table:


Before index rebuild After index rebuild % Change
CPU Elapsed CPU Elapsed CPU Elapsed
Query 1 0.47 0.47 0.23 0.23 51% 51%
Query 2 0.00 0.01 0.00 0.00 0% 0%
Query 3 1.11 1.11 0.96 0.96 13.5% 13.5%
Query 4 0.00 0.00 0.00 0.00 0% 0%

The results in the above table show that queries 1 and 3 experienced up to 51% improvement in execution time. It should be noted that the queries were executed at least once prior to tracing the queries so that the data would reside in the buffer cache and first time executions of the queries would not skew the results. This fact explains why the CPU and elapsed times for a query are nearly identical. The data was read from the buffer cache. Queries 2 and 4 took minimal time to execute. These queries searched for specific values. Since the data was cached, the CPU and Elapsed times are minimal. Queries 1 and 3 took significantly longer. These queries returned a range of values. This paper does not attempt to generate any findings on the improvement of different types of queries when rebuilding an index. Rather it gives a broader statement that query execution times will improve after rebuilding substandard indexes.

Script to find indexes to rebuild

Below is a sample script that can be run to determine which indexes need to be rebuilt. For those indexes that need to be rebuilt, the ALTER INDEX REBUILD command is dynamically generated as output. The user can tailor the height and percentage of deleted leaf rows by altering the vMaxHeight and vMaxDel variables. The output of this script can be spooled to a file. This file can then be run to rebuild the indexes.

-- validate_idx.sql
-- by Arun Mahendran
-- Database Administrator
-- 26 April 2008
--- This script will check indexes to find candidates for rebuilding.
-- Run this script in SQL*Plus as a user with SELECT ANY TABLE
-- privileges.
---- This script can be used and modified without permission. Run this
-- script at your own risk! The script author is not responsible for
-- any problems that may arise from running this script.
set serveroutput on size 100000
DECLARE
vOwner dba_indexes.owner%TYPE; /* Index Owner
*/ vIdxName dba_indexes.index_name%TYPE; /* Index Name
*/ vAnalyze VARCHAR2(100); /* String of Analyze Stmt
*/ vCursor NUMBER; /* DBMS_SQL cursor
*/ vNumRows INTEGER; /* DBMS_SQL return rows
*/ vHeight index_stats.height%TYPE; /* Height of index tree
*/ vLfRows index_stats.lf_rows%TYPE; /* Index Leaf Rows
*/ vDLfRows index_stats.del_lf_rows%TYPE; /* Deleted Leaf Rows
*/ vDLfPerc NUMBER; /* Del lf Percentage
*/ vMaxHeight NUMBER; /* Max tree height
*/ vMaxDel NUMBER; /* Max del lf percentage
*/
CURSOR cGetIdx IS SELECT owner,index_name
FROM dba_indexes WHERE OWNER NOT LIKE 'SYS%';
BEGIN
/* Define maximums. This section can be customized. */
vMaxHeight := 3;
vMaxDel := 20;

/* For every index, validate structure */
OPEN cGetIdx;
LOOP
FETCH cGetIdx INTO vOwner,vIdxName;
EXIT WHEN cGetIdx%NOTFOUND;
/* Open DBMS_SQL cursor */
vCursor := DBMS_SQL.OPEN_CURSOR;
/* Set up dynamic string to validate structure */
vAnalyze := 'ANALYZE INDEX ' vOwner '.' vIdxName '
VALIDATE STRUCTURE';
DBMS_SQL.PARSE(vCursor,vAnalyze,DBMS_SQL.V7);
vNumRows := DBMS_SQL.EXECUTE(vCursor);
/* Close DBMS_SQL cursor */
DBMS_SQL.CLOSE_CURSOR(vCursor);
/* Does index need rebuilding? */
/* If so, then generate command */
SELECT height,lf_rows,del_lf_rows INTO vHeight,vLfRows,vDLfRows
FROM INDEX_STATS;
IF vDLfRows = 0 THEN
/* handle case where div by zero */
vDLfPerc := 0;
ELSE
vDLfPerc := (vDLfRows / vLfRows) * 100;
END IF;
IF (vHeight > vMaxHeight) OR (vDLfPerc > vMaxDel) THEN DBMS_OUTPUT.PUT_LINE('ALTER INDEX ' vOwner '.'
vIdxName ' REBUILD;');
END IF;
END LOOP;
CLOSE cGetIdx;
END;
/
Summary

One of the areas lacking in Oracle documentation pertains to rebuilding an index. Oracle’s ANALYZE INDEX VALIDATE STRUCTURE command provides a nice way to check an index to see if it is a candidate for rebuilding. This command does not affect the Oracle optimizer’s execution plan for queries that may use the index. The results in the INDEX_STATS are checked after issuing the VALIDATE STRUCTURE command. If an index has excessive height (greater than four) or a high number of deleted leaf rows (over 20% of the total), we rebuild the index.
A test case was studied to show the potential for improvements after rebuilding an index. In one example, the query executed 51% faster after rebuilding the index. While this example is specific to the test case, it does show the potential for improvement after rebuilding an index.
Keywords:

Index Index Rebuild Rebuild Index Online ALTER INDEX REBUILD ANALYZE INDEX VALIDATE STRUCTURE ANALYZE INDEX LF_ROWS DEL_LF_ROW

Thursday, April 17, 2008

Increase in Controlfile Size ???

In Oracle8i and higher, if the MAXDATAFILES limit is reached the controlfile will expand automatically.In Oracle 8i and higher, when you issue CREATE DATABASE or CREATE CONTROLFILE statements, the MAXDATAFILES parameter specifies an initial size of the datafile portion of the control file.

Later, if you add a file whose number exceeds MAXDATAFILES but is less than or equal to the value specified by the DB_FILES initialization parameter, the control file automatically expands to allow the datafile portion to accommodate more files.

We can have DB_FILES set to more than the value of MAXDATAFILES.However note that these DB_FILES parameter value should be within the OS kernel limits.

Refer to Note 144638.1 - Relationship Between Common Init.ora Parameters and Unix Kernel Parameters.

If you are adding datafiles to the database and is within the DB_FILES limit, you will get an error only if control file is unable to allocate more space. Recreating the control file is not required to increase the MAXDATAFILES parameter.It is not good to have a high DB_FILES parameter value ( much higher than required ). Increasing the value of DB_FILES increases the size of the PGA, or Program Global Area, which is allocated for every user process connected to ORACLE.
KeyWords:

CREATE CONTROLFILE CREATE DATABASE controlfile 144638.1 Unix Kernel Parameters OS kernel limits controlfile size