Data Pump – introduced in Oracle Database 10g Release 1 – has been described in many different ways: from “Export/Import from Steroids” to the “Ultimate Data Mover”. Perhaps the use of superlatives creates a perception that this is only for data movements. Numerous presentations, articles, books and even Oracle manuals permeate that perception.
However, Data Pump is much more than just moving data. While that was probably the original and primary intent and it still remains the principal usage, the other subtle but significant capabilities of Data Pump remains under represented if not undiscovered. In this article you will learn about those lesser known capabilities and how to use them to achieve your strategic and tactical objectives. Put into good use Data Pump can ease a lot of daily and longer term non-data movement related tasks and provide a viable platform for alternate use.
This article assumes that the attendee is familiar with the Data Pump tool. This is not a introduction to the tool itself.
A Data Pump Primer
Even though the audience is assumed to be fully conversant with Data Pump, a quick primer will cement that understanding and pave way for a better understanding of the material to come. If you are familiar with the basics of the tool, you may want to skip this section in its entirety.
Data Pump is an evolution of the original export and import tools; and in fact, it’s also named Data Pump Export and Import. The purpose is similar – to export data out of the Oracle database and import back. The exported data is put in a dumpfile, which is portable across Operating Systems and even across Oracle versions (10g Release 1 and above). This export works by selecting the data from the tables using the SQL interface, similar to the original export; but at a much faster rate. My tests have shown Data Pump Export to be about 10 times faster than original export when conditions are right – with high degrees of parallelism, multiple spindles and so on. Similarly Data Pump Import is also faster than the original import, albeit at a lower scale.
The dumpfile produced can only be ready by Data Pump Import, not the original import. The same holds true for the reverse process as well, i.e. the dumpfile produced by the original export is not readable by DP Import. A major difference in the two tools is the location where the dumpfiles are produced. In the original export, the dumpfiles are produced in the same server the tool is called; so if you issue the exp command at a client machine, such as your laptop, the dumpfile is produced there. In contrast, the DP Export always produces the dumpfile on the database server regardless where it is called. You can specify a location by creating a directory object and specifying that in the command line.
Finally, for the sake of completeness, the commands are “expdp” for DP Export and “impdp” for DP Import.
Now that the introduction part is complete, let’s jump to the main topic – the usefulness of this tool beyond the data movement.
Regulatory Compliance
Would you want to store the source codes of the database codes such as stored procedures and DDLs of the objects? It would have been nice; but probably not strictly needed, just a fe years ago. But thanks to the myriads of laws and regulations covering the corporate America today, this has now become a necessity; not just a nice to have. For instance, laws like Sarbanes-Oxley (SOX) actually require that you have to have a system where you can track the changes to object definitions – commonly referred to as “metadata” in Oracle technical parlance. Please note, the word metadata has many different meanings in technical discussions. For instance, the original meaning of the term is “data about the data”, i.e. the meaning of the data elements such as columns and tables. For example, the metadata on the column STATUS in the table EMP could be: it refers to the status of the employee in the company at this time and it can have values such as ACTIVE, INACTIVE, SUSPENDED, etc. This description is commonly referred to as “metadata”. However, in Oracle-speak, we usually refer to the definition of the objects, i.e. the DDL statement to create the table, the source code of the procedures and packages as metadata. I have used the latter definition of the term.
Most regulations and internal controls in organizations require that the source code be stores in a repository which can be controlled and tracked. If the definitions changed, the system should be able to track it and report changes. The historical records of the metadata should be persevered. How do you accomplish this? One way to achieve this is constructing the sources and DDL from the data dictionary in regular intervals. For tables, you would select from DBA_TAB_COLUMNS (for columns), DBA_CONSTRAINTS, and DBA_CONS_COLUMNS (for constraints), DBA_TAB_PRIVS (for privileges granted to the table) and many others. For source code, you will need to select from DBA_SOURCES; te view DBA_TRIGGERS for triggers, DBA_VIEWS for view definitions and so on. The other option is extracting the DDL from the dictionary using DBMS_METADATA.GET_DDL packaged function. The solutions are technically possible; but may seem infeasible considering the effort. So, to address that deficiency, several tool manufactures have jumped into the fray. Needless to say, the tools do their job; but at a cost.
Enter Data Pump, to accomplish the same objective. Let’s see how.
Metadata Management
One of the least used parameters of Data Pump Export is CONTENT, which controls what is exported. It accepts three parameters:
DATA_ONLY – this option exports the data from tables.
METADATA_ONLY – this option does not export the data; merely exports the metadata of the database objects.
ALL –this is the default; both the metadata and the data (where applicable) are exported.
This parameter is the one you can use to create a baseline repository of the database sources and DDLs very quickly. All you have to do is to use:
expdp directory=dpump_dir dumpfile=metadata.dmp content=metadata_only
This creates the file metadata.dmp in the directory specified by the directlory object dpump_dir containing all the DDLs and source codes of the database objects. The file does not contain any data. This is your baseline repository of the source. If you ever need to reinstate the source, simply drop it from the database and import back the source from this baseline:
impdp directory=dpump_dir dumpfile=metadata.dmp
This will import the objects that are not present in the database now and skip the ones that are there.
In many cases, your database as a whole may not be subject to the baseline repository building; you may want to store objects of specific schemas in the repository; not all. For instance you may want to include the ACCMAIN and HRMAIN schemas but not the SYSTEM schema. The above command would be slightly modified in that case:
expdp directory=dpump_dir dumpfile=metadata.dmp content=metadata_only schemas=ACCMAIN, HRMAIN
By default, all the database objects in the schema are exported – tables, views, triggers, procedures, and so on. What if you want to store each object separately in its individual files? For instance, you may want to store the stored code such as triggers, procedures, etc. on a file named metadata_code.dmp and tables, views, etc. on a file named metadata_tabs.dmp. In this case, you can use the INCLUDE parameter to include only certain objects or types of objects in the export file; not all. The above command will then turn to:
expdp directory=dpump_dir dumpfile=metadata_code.dmp content=metadata_only schemas=ACCMAIN, HRMAIN include=PROCEDURE,FUNCTION,PACKAGE,TRIGGER
Here the dumpfile has only the code related metadata, not all.
You can also specify a specific procedure or finction to export. Suppose you want to keep a very crucial function – GET_RATE and a table – RATES, in a separate file – metadata_imp.dmp, you can use:
expdp directory=dump_dir dumpfile=metadata_imp.dmp schemas=ACCMAIN include=FUNCTION:\"=\'GET_RATE\'\",TABLE:\"=\'RATES\'\"
Note, I have added the backslash (“\”) before the double quotes (“) and the single quotes (‘), just so that the unix shell will not interpret the characters as commands. If you use Windows, you will need to remove the back slashes. Alternatively, a better way is to use a parameter file where you can list all the parameter, one per line and call the parameter using the PARFILE parameter. In this case, just create a file called exp.par with the following as the contents:
directory=dump_dir
dumpfile=metadata_imp.dmp
schemas=accman
include=FUNCTION:"='GET_RATE'"
include=TABLE:"='RATES'"
Then you can call the Data Pump Export as
$ expdp parfile=exp.par
In the parfile, you can see that I have placed two lines with the INCLUDE parameter; that’s allowed. I can place as many INCLUDE parameters as I need. The include parameter takes the values as
For instance, to include the functions FN1 and FN2, you would use:
include=function:"in ('FN1','FN2')"
or
include=function:"like 'FN%'"
Of course, you can also use this to get all but some:
include=function:"not in ('FN1','FN2')"
However, a more convenient alternative may be to use the EXCLUDE option. This is the inverse of INCLUDE and instructs the expdp command to exclude the objects. For example, you want to export the meta data for all objects excluding the functions FN1, FN2 and procedures PROC1 and PROC2. You will use:
exclude=function:"in ('FN1','FN2')"
exclude=procedure:"in ('PROC1','PROC2')"
To include all objects except all the tables in the export dump:
exclude=table
The syntax is the same as the include option. Note a very important difference in the usage of the parameters. Include is used to include a specific objects of a type; exclude is used to exclude a specific object of a type. They are not necessarily interchangeable. Consider these two options:
exclude=function:"in ('FN1','FN2')"
and
include=function:"not in ('FN1','FN2')"
They are not the same. The first option (the exclude one) exports all the objects – tables, views, triggers, etc. and leaves out just two functions – FN1 and FN2. The second option (the include one), on the other hand, only exports all the functions except FN1 and FN2. Only the functions are exported, not tables, views, etc. They are fundamentally different in their behavior. You have to consider this carefully when you construct your parameter file.
Creating an SQL File
Earlier you learned how to create a baseline and also selective exports of the database objects for safe keeping. However, the dumpfiles are not human readable, i.e. you can’t just glean over the files and get the metadata. You need another step – to extract the SQL out of the dumpfile. You can do that using the parameter SQLFILE in the Data Pump Import. Here is an example:
$ impdp directory=dump_dir dumpfile=metadata_full.dmp sqlfile=metadata_full.sql
This operation does not actually import the objects into the database; but creates an SQL script called metadata_full.sql containing all the objects in the dumpfile. Here is an excerpt from the file:
-- CONNECT ARUP
-- new object type path is: DATABASE_EXPORT/TABLESPACE
CREATE UNDO TABLESPACE "UNDOTBS1" DATAFILE
'+NDSDATA7/undotbs101.dbf' SIZE 17179869184,
'+NDSDATA8/undotbs102.dbf' SIZE 17179869184,
'+NDSDATA9/undotbs103.dbf' SIZE 17179869184
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL AUTOALLOCATE;
CREATE TABLESPACE "SYSAUX" DATAFILE
'+NDSDATA1/sysaux01.dbf' SIZE 4294967296,
'+NDSDATA1/sysaux02.dbf' SIZE 4194304000
NOLOGGING ONLINE PERMANENT BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT SPACE MANAGEMENT AUTO;
CREATE TEMPORARY TABLESPACE "TEMP1" TEMPFILE
'+NDSDATA1/temp1_01.dbf' SIZE 1073741824
......... and so on ........
You can also apply the filters to extract the SQL statements for specific objects as well. For instance, you can extract only the procedures, functions and packages; nothing else by using the INCLUDE parameter
include=procedure, package
You can include specific objects as well; not just specific types.
include=procedure:"='PROC1'"
Here is the file that is eventually produced:
-- CONNECT SYS
-- new object type path is: SCHEMA_EXPORT/PROCEDURE/PROCEDURE
-- CONNECT ARUP
CREATE PROCEDURE "ARUP"."PROC1"
as
begin
dbms_output.put_line ('Some text');
end;
/
-- new object type path is: SCHEMA_EXPORT/PROCEDURE/ALTER_PROCEDURE
ALTER PROCEDURE "ARUP"."PROC1"
COMPILE
PLSQL_OPTIMIZE_LEVEL= 2
PLSQL_CODE_TYPE= INTERPRETED
PLSQL_DEBUG= FALSE
REUSE SETTINGS TIMESTAMP '2006-08-11 13:27:55'
As with the export, you can exclude some objects as well:
exclude=procedure:"='PROC1'"
Building a Repository
Now that you learned how to extract the database objects into SQL files, you can build a system or a rudimentary tool to make the process resemble more like, well, a process. This can be effected somehow using the a shell script and creating the dumpfile and the SQL file named based on the date of creation. Here is how the shell script would look like:
DATEVAR=`date +"%m%d%y"`
DMPFILE=metadata_${DATEVAR}.dmp
SQLFILE=metadata_${DATEVAR}.sql
echo "arup"
expdp arup directory=tmp_dir dumpfile=${DMPFILE}
echo "arup"
impdp arup directory=tmp_dir dumpfile=${DMPFILE} sqlfile=${SQLFILE}
This script will create the file in the format metadata_
As a part of the maintenance, you may want to move the created files periodically. Here is a simple command that moves the files to a name
find . -name "*.dmp" -ctime +30 -exec mv {} {}.old\;
Or, you can just remove it
find . -name "*.dmp" -ctime +30 -exec rm {}\;
Using this, you can build a full version control and repository system in just hours; not months and without using any tool from outside of what you already has. This is one of the most powerful yet underutilized capabilities of the Data Pump.
Create a User Like …
Here is a request you may have received several times – create a user just like JOE. The requestor doesn’t provide any other details – no tablespace quotas, no grants and so on. All the requestor knows that JOE userid works; so another one like JOE will work as well. How do you take care of the request?
Of course, there are tools that can do this quickly; but you may not have them handy. Your only option at that point is to painstakingly create an SQL file caturing the user’s properties for data dictionary, all the objects and so on. While it’s possible, it’s tedious, prone to errors and sunstantilly time consuming.
Enter Data Pump to the rescue. You can create the user in just two commands:
$ expdp schemas=JOE content=metadata_only
$ impdp remap_schema=JOE:NEWUSER
The first command (export) is necessary only if you don’t have a full export dump or baseline now. If you have been using DP to take your metadata baselines, you don’t need to perform that step. The second command (the import) imports the metadata about the user JOE, but transforms it so that instead of JOE, the user NEWUSER is created. All grants, privileges, quotas, objects, etc. of JOE now goes to NEWUSER. In just a matter of minutes you created a user like another accurately.
If you have been using the repository as several versions with timestamps as shown in the previous section, you can even use that to create a user like JOE as of a certain date. For instance, JOE has DBA privileges now; but didn’t have them yesterday. Could we have NEWUSER sans DBA role? Sure; just use the dumpfile as of that day.
Create Tablespaces in Test as in Production
This is another of the challenges you may face – you have been asked to create a small test database, of the fraction of the size of the main database; but the structural elements must be identical. i.e. the tablespaces should be same, laid out in the same manner, across the same mount points and so on. How do you quickly do it?
Sure, you could write a PL/SQL loop to get all the datafile properties from the data dictionary (dba_data_files and dba_tablespaces) and construct the SQL script to create the tablespaces. It will work; but will take a significant amount of time, and, needless to say, is highly susceptible to errors.
Again, Data Pump comes to rescue. All you have to do is import the dumpfile and use the INCLUDE option.
$ expdp content=metadata_only
$ impdp include=tablespace
Like the previous case, the first command (expdp) is not necessary if you already have a dump. Instead of creating the tablespace directly, you may also want to see the script by using the SQLFILE option:
$ impdp include=tablespace sqlfile=c.sql
This will create a file called c.sql, which will have the tablespace creation scripts. You ca edit them, if needed and run them from SQL*Plus.
A slight variation of the above may be more practical. In many cases, the test database may not have or need the size of the production database tablespaces. While creating the tablespaces, you may want to reduce the size of the datafiles by a certain percentage, say 90%. One option is to create a SQL file, edit it and manually change the sizes of all datafiles to 10% of the production size. While it’s doable, it’s not elegant as it adds time and risk of error. You can achieve that in a simple option – TRANSFORM.
$ impdp include=tablespace sqlfile=c.sql transform=pctspace:10
The parameter transform=pctspace:10 instructs the IMPDP job to change the size of the datafiles to 10% of the original size. Here is the excerpt from the SQL file created without this parameter:
CREATE UNDO TABLESPACE "UNDOTBS1" DATAFILE
'/u01/undotbs101.dbf' SIZE 17179869184,
And here is the excerpt when the parameter was applied:
CREATE UNDO TABLESPACE "UNDOTBS1" DATAFILE
'/u01/undotbs101.dbf' SIZE 1717986918,
As you can see, the size with the parameter (1,717,986,918) is about 10% of the original size (17,179,869,184). You will learn more about the TRANSFORM parameter later in this article.
Data File Name Change
You are moving some tables from another database and the tablespace in which these tables exist is also new in the target database. Importing those tables will create the tablespaces in the target database too; but here is a small problem – the filesystems are different. For example, the datafile for tablespace was “/u01/data1.dbf” but the target database does not have a mount point called /u01; instead it has “/u02”. Ordinarily you would have to create the tablespace manually and then import the table into the tablespace.
Data Pump eliminates the extra step in one elegant motion. All you have to do is use the REMAP_DATAFILE option as follows:
$ impdp remap_datafile=‘/u01/data1.dbf':'/u02/data1.dbf'
This creates the same datafile as /u02/data1.dbf. you can use this option to recreate the test data even across different platforms. For instance, the following option in the option converts datafiles from a unix to Windows format.
remap_datafile='/u01/data1.dbf':'c:\oradata\data1.dbf'
Segment Transforms
This is also a common problem in creating a test database from production. The test database may be very small compared to the production volume. Therefore, when you export and import the tables, they may fail if the initial extents are defined too large to fit in the test database. Instead of creating the table manually prior to importing, you will find Data Pump doing that work for you. You can remove the storage clauses of the tables, MVs etc. using a simple parameter TRANSFORM to impdp. The parameter has the following format:
transform=segment_attributes:[N/Y]:
transform=segment_attributes:n:table
When put it in impdp, the table creation does not include physical attributes, storage attributes, tablespaces and logging features. Here is a small example to illustrate the concept. First do a normal import:
$ impdp arup/arup directory=tmp_dir dumpfile=metadata_full.dmp sqlfile=trans.sql include=table:\"=\'TRANS\'\"
Here is the DDL as shown in the SQL file trans.sql:
CREATE TABLE "ARUP"."TRANS"
( "TRANS_ID" NUMBER,
"TRANS_DT" DATE,
"PRODUCT_CODE" NUMBER,
"STORE_ID" NUMBER,
"TRANS_AMOUNT" NUMBER(12,2)
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(
BUFFER_POOL DEFAULT)
TABLESPACE "USERS"
PARTITION BY RANGE ("TRANS_DT")
(PARTITION "Y05Q1" VALUES LESS THAN (TO_DATE(' 2005-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "Y05Q2" VALUES LESS THAN (TO_DATE(' 2005-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "Y05Q3" VALUES LESS THAN (TO_DATE(' 2005-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "Y05Q4" VALUES LESS THAN (TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" NOCOMPRESS ,
PARTITION "PMAX" VALUES LESS THAN (MAXVALUE)
Now, use the same command using the transform option:
$ impdp arup/arup directory=tmp_dir dumpfile=metadata_full.dmp sqlfile=metadata_full.sql include=table:\"=\'TRANS\'\" transform=segment_attributes:n:table
If you see the SQL file:
CREATE TABLE "ARUP"."TRANS"
( "TRANS_ID" NUMBER,
"TRANS_DT" DATE,
"PRODUCT_CODE" NUMBER,
"STORE_ID" NUMBER,
"TRANS_AMOUNT" NUMBER(12,2)
)
PARTITION BY RANGE ("TRANS_DT")
(PARTITION "Y05Q1" VALUES LESS THAN (TO_DATE(' 2005-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
NOCOMPRESS ,
PARTITION "Y05Q2" VALUES LESS THAN (TO_DATE(' 2005-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) N
OCOMPRESS ,
PARTITION "Y05Q3" VALUES LESS THAN (TO_DATE(' 2005-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) N
OCOMPRESS ,
PARTITION "Y05Q4" VALUES LESS THAN (TO_DATE(' 2006-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) N
OCOMPRESS ,
PARTITION "PMAX" VALUES LESS THAN (MAXVALUE) NOCOMPRESS ) ;
Note how the storage options such as initial/next/max extents, buffer pool, etc. have been removed, even from the partitions. This way there is no conflict with the target database’s storage parameters and the table can be created easily.
If you want to remove storage parameters from all objects, not just tables, leave the third part of the value in the TRANSFORM option:
transform=segment_attributes:n
Reducing Size
On the same line as the above, suppose you want to still keep the initial extent clause in the CREATE TABLE script but want to make it much smaller than it’s in production database. Ordinarily you would have done it using the more laborious manual approach, i.e. creating the SQL file, calculating the desired initial extent and editing the SQL file to plug in those numbers. This may be technically possible but may be infeasible.
A special modifier “PCTSPACE” in the TRANSFORM parameter shown above does the trick. Specifying a PCTSPACE puts the initial extent size as that much percentage of the production database. Here is how you can specify a 10% size of the initial extent:
$ expdp transform=pctspace:30 tables=accounts
In the above case, if the original INITIAL extent of the table ACCOUNTS was 100M, the import creates the table with INITIAL as 10M, or 10% of the original.
Sub-setting a Table
This is also a very common request – create a sub-set of a table in production in the test database so that the tests can be done against it. There could be several reasons why the full table couldn’t be used – the most common being lack of space in test database. So, how can you create a smaller portion of the table, say, 10% of the original rows?
There are two options while creating a sub-set of the table:
1. Take a random sample of 10% of the rows
2. Take the specific rows that satisfy some criteria
Random Sample
To export a random sample of 10% of the rows of all the tables, use the SAMPLE parameter in expdp command as shown below:
$ expdp SAMPLE=10
This takes a sample from all the tables. Suppose you want to take a sample from the table ACCOUNTS in ARUP schema only:
$ expdp SAMPLE=ARUP.ACCOUNTS:10
Rows Satisfying Some Condition
Suppose you want to create a subset of the table based on some criteria, e.g. “SALARY>10000”, you would issue
$ expdp query=employees:"where salary>10000" tables=employees
This can also take the ORDER BY clause to create the dumpfile in a sorted order. Suppose you want to dump the EMPLOYEES table order by SALARY, here is how the complete command looks like (with the unix required escape characters – backslahes):
$ expdp arup/arup directory=demo_dir dumpfile=employees.dmp query=employees:\"where salary\>10000\ order by salary" tables=employees
If you already have a full dump, you may want to import from that instead of creating a subset of the table in export. You can also use the QUERY parameter in impdp as well.
$ impdp QUERY=CUSTOMERS:“WHERE TOTAL_SPENT > 10"
This can also take the ORDER BY clause. This approach can be used to quickly load parts of the production table to test database. Please note that this does not take care of referential integrity constraints, i.e. the loaded rows do not necessarily satisfy all the foreign key relationships. For instance, if you import 10% of the tables DEPARTMENTS and EMPLOYEES, there is no guarantee that all the department number of the employees in the 10% of the rows of the EMPLOYEES table will be in the 10% of the DEPARTMENTS table. So, the import will fail. In such a case, you can either disable the constraint (not a very useful option) or use the QUERY parameter to select specific rows.
Refresh a Table Definition
You maintain two different databases, say, Development and Integration. From time to time you want to make sure the definitions of the tables are in sync between the two databases. And suddenly you realize that the databases may be out of sync in the structure of the objects. How can you refresh the table in INT so it matches the copy in DEV? Of course you can drop and recreate all the objects; but that’s a task easier said than done. You need to create the SQL script of all the objects – tables, views, procedures, triggers, grants made and received and so on and so forth. You could just drop everything and recreate; but then that means dropping something that may be in INT only and that will not be recreated.
As always, you can use Data Pump to achieve this in just one command. Just take an export of the metadata only. In this example, we assume that you want to refresh only the table ACCOUNTS. After the export, just import the table with the option table_exists_action set to “replace”, which drops the table and recreate it from the metadata contained in the dumpfile.
$ expdp tables=ACCOUNTS content=metadata_only
$ impdp full=y table_exists_action=replace
You can also use this technique to reinstate the copy of the table from the repository as of a certain date. While importing you may also want to modify the storage parameters of the table by using the TRANSFORM parameter as the storage parameters may not be adequate for the target database.
Changing Table’s Owner
This is not a everyday case but it does happen. Someone created a table in the wrong schema – SCOTT, instead of ACCMAIN. Subsequent operations have placed dependent objects on it – triggers, constraints, indexes, etc.; and SCOTT has granted privileges to others. When the mistake was realized, it was too late to make a quick change. What can you do to quickly change the owner.
Unfortunately, in Oracle there is no such thing called changing the owner of the table. The only option is a series of steps:
1. create the SQL to create table;
2. create the table in the new schema;
3. create all grants, triggers, constraints and so on;
4. export the table,
5. import into the old schema
6. drop the table in the old schema
Although feasible, this series of steps is very laborious and error-prone. Of course there are tools to do this; but again, we assume that we don’t have access to these tools. Wedo have access to, however, the Data Pump tool, which accomplishes the task in just one line, using the parameter “remap_schema”.
$ impdp remap_schema=“OLDU:NEWU” network_link=maindb directory=…
Here you changed the owner from OLDU to NEWU. Here is an important point to note here – the use of the parameter “network_link”. It’s a database link created pointing to the same database. The use of this parameter saves us from running an expdp to create a dumpfile. The parameter lets impdp get the metadata across the database link and bypass the dumpfile creation step.
External Tables
External tables are not new in the database; they were first introduced in Oracle 9i. In essence, they are text files outside the database, but are visible to the database users as simple tables. When users select from the table, the actual text file is read and the rows returned as if they are rows of a regular table. The table can only be queried; not updated.
You can also create an ASCII file from a table or set of tables and create an external table on that file to make it visible to the applications. This comes handy while sending some data form one system to another. Since the external tables do not need to be loaded in the database to be visible, this technique makes the data immediately available to the end users and relives stress on the database. The real challenge is how to make this file that will be eventually be used as an external table. You can use the simple SQL*Plus to select all the columns and spool to a file. This approach is the simplest but also the slowest. The other option is to use OCI or Pro*C program to create these files. This makes the file creation quicker; but also adds a lot of time to the process.
Many people are not aware of the fact that you can use Data Pump to create the external table file. This file is not ASCII; it’s binary; but it’s readable across platforms. Let’s see an example: Here we are creating a file trans_ext.dmp to be used as an en external table from the table TRANS.
create table trans_ext (
trans_id,
trans_dt,
product_code,
store_id,
trans_amount
)
organization external
(
type oracle_datapump
default directory tmp_dir
location ('trans_ext.dmp')
)
as
select * from trans
order by trans_id;
Once the table is created, you can use the file as an external table – trans_external – as:
create table trans_external (
trans_id number,
trans_dt date,
product_code number,
store_id number,
trans_amount number(12,2)
)
organization external
(
type oracle_datapump
default directory tmp_dir
location ('trans_ext.dmp')
);
And that’s it; the file is now available as an external table. You would select from it as you would select from a regular table:
select *
from trans_external;
So, how does all this make it any better? The user can always select from the original table TRANS, instead of TRANS_EXTERNAL; so there does not seem to be any advantage in using this approach.
The advantage comes when the user does not have direct access to the database. In order to send the data to a remote user, you have to send it using Data Pump Export. At the other end, the user must import it into the database before he can start using it, which requires some technical expertise. On the other hand, the external table approach takes no technical expertise; the user can simply make the file available to the same server as the database runs on, using ftp, copying the file or something as trivial as mounting the CD on the CD-ROM. As soon as the server sees the file, the external table is immediately visible.
Uses of External Tables
Offline Receiver – in the cases where the receiver of the data is not online with the database, this is the best way to send data without strong technical expertise. The common example is a publication industry, which sends out period updates to the data it sells. Instead of importing the new data, the publisher merely creates an external table file, burns a CD with the file an sends it to the readers. The application on the database reads the external table. When the reader pops in the new CD, the application immediately sees the new data; no data manipulation need occur. Since the data in the external table can’t be written to, the system provides a read only mechanism to the process.
Ease of Creation – You don’t have to write any code to create the files.
Ordering and Grouping – While creating the external table files, you can use WHERE, ORDER BY and GROUP BY clauses, thereby sorting and grouping data. This approach can be used in ETL processes where the data can be loaded without an intermediate sorting/grouping step. One of the best uses of this technique is the creation of Index Organized Tables, described in a sub-section below.
Faster Loading – The regular Data Pump Import still goes through the SQL layer, making it somewhat slower. The fastest way to load a table is still the APPEND method. So, you could use Data Pump export to create the external table file, create external table and load into the main table using the APPEND hint:
insert /*+ append */ into trans
select * from trans_external;
This allows the Direct Path Insert, bypassing the SQL processing layer and minimizing undo and redo and significantly improving the processing time.
Creating IOTs – An Index Organized Table is a table built on a primary key index. In a normal PK index, only the PK columns and the pointer to the corresponding row in the table are stored. In the IOT, the entire row is stored along with the PK index columns. When a user queries a normal table using the PK columns in the WHERE condition, Oracle first gets the rowid from the PK index and gets the row from the table – a two step operation. In an IOT, since the entire row is stored along with the PK, the second step is eliminated.
When you create an IOT, the rows can be loaded in any manner; but if the rows are already in a sorted format, the sorting from the IOT is considerably faster. This can be easily done in Data Pump using the external table approach. While creating the external table files, just use the ORDER BY clause on the PK columns and then load them; the rows will be in sorted order.
General Tips ‘n Tricks
Parallelizing
To make DP Exports run faster, you can use the parallel option to execute the job in parallel. However, if you do use the parallel option, make sure you have that many files as parallel degree; otherwise the parallelism will not take effect.
$ expdp tables=CASES directory=DPDATA1 parallel=4 dumpfile=\(expCASES1.dmp,epCASES2.dmp,expCASES3.dmp,expCASES4.dmp\)
If you do not wan to name the files individually, you can use the %U wildcard.
$ expdp tables=CASES directory=DPDATA1 dumpfile=expCASES_%U.dmp parallel=4
Monitoring
One of the often asked questions is how to monitor the performance of the DP jobs that are running now. There are a few ways you can do it. One is identifying and monitoring the Oracle sessions of the DP jobs. The information about the Data Pump jobs are found in the view DBA_DATAPUMP_SESSIONS.
select sid, serial#
from v$session s, dba_datapump_sessions d
where s.saddr = d.saddr;
Once you identify the session, you can see the parallel query slave sessions spawned off by the main DP session by querying the V$PX_SESSION view.
select sid from v$px_session
where qcsid =
When DP jobs run, they post important runtime information in the view V$SESSION_LONGOPS. You can view all that by issuing:
select sid, serial#, sofar, totalwork
from v$session_longops
where opname = '
and sofar != totalwork;
Troubleshooting
Like anything else, DP sometimes does not work the way it is supposed to, or perceived to –leading to the next section – Troubleshooting. Many problems can be easily diagnosed using the TRACE on the tool’s innards. There are two different types of trace on Data Pump. The trace built into the tool itself; and the SQL trace on the sessions of DP. Let’s see how each one works.
Data Pump TRACE
The trace is given as a parameter in the command line of the tool in the following format:
TRACE=
The
1FF – Full Tracing
048 – Standard Tracing
In many cases the Standard tracing provides all the information you want for a detailed troubleshooting; but sometimes, and usually at the direction of the Oracle Support Services, you may want to enable the Full tracing. Here is an example of the standard tracing.
$ impdp trace=0480300 schema=... and so on ...
This produces a trace file in the user_dump_dest directory, which shows in detail different commands and where they are failing, if at all they are failing.
SQL Trace
This is the usual SQL Tracing you may have been familiar with; however there is a small challenge. Since DP jobs kick off the sessions themselves, you don’t have an opportunity to set the trace in the session by specifying “alter session set sql_trace = true”. Instead, you must enable an event in the session from outside using the dbms_system.set_ev procedure.
First, get the SID and Serial# of the Data Pump session:
select sid,serial#,username,program
from v$session
where upper(program) like '%(DW%)'
or upper(program) like '%(DM%)';
Then trace the session (place the SID and Serial# below)
SQL> exec dbms_system.set_ev(
This will produce a trace file in the user_dump_dest directory of the database server.
Conclusion
Data Pump, although originally designed to be a tool to move a lot of data quickly and across multiple database versions and platforms, has some quite useful functionalities much beyond that. In this article you learned a few of those non-traditional uses of the tool that prove extremely useful in the long and short term objectives, without the use of any other expensive tool, whether from Oracle or another third party. The capabilities of the tool are not limited to those listed in the article; but only to your imagination