Friday, February 8, 2013

Preprocess clause External Tables - ORACLE_LOADER Access Driver

External tables exists since Oracle version 9i, enable to view a file as if it were data in a database. To simplyfy the Extract, Transform, and Load (ETL) processes oracle had added a parameter PREPROCCESS  to sqlloader for extrations by external tables. The PREPROCESSING  parameter is introduced in Oracle version 11Gr2 which allow ETL processes handle any type of file without additional out-of-line action. (For example execute a shell script, unzip).


Step 1. Create a external table

. For creating a external table the must be existing a DIRECTORY object.
create directory etl_dir as '/joords/etl_dir'; 
grant read, write on directory etl_dir to etladmin;
As user etladmin create the table
create table joords_data1
(
    cust_id      number,
    cust_name    varchar2(20),
    credit_limit number(10)
)
organization external
(
    type oracle_loader
    default directory etl_dir
    access parameters
    (
       records delimited by newline
       fields terminated by ","
    )
location ('joords_data1.txt')
);

Step 2. Create a text file "/joords/etl_dir/joords_data1.txt"

1,jos van den Oord,1000
2,Transfer - Solutions,2000

Step 3. Test the external table functionality

select * from joords_data1;
 
   CUST_ID CUST_NAME            CREDIT_LIMIT
---------- -------------------- ------------
         1 jos van den Oord             1000
         2 Transfer - Solutions         2000
Supose that this file was compressed or zipped. To load the file the first step to perform is to unzip or uncompress the file and the second step is to extract with SQLLOADER. By the preprocessor feature, it is now possible to do extraction of the file like unzip or uncompress and view the file by external table in the database in one command.

The PREPROCESSOR clause and setup

create a directory for all the preprocessor programs as sys
create directory execdir as '/joords/prep_tools';
Grant the correct privileges on the directory to the user. Never give the user WRITE privilege to the directiry => Security issues.
grant read , execute on directory execdir to etladmin;
Next step is to create the external table with the following command.
create table joords_data2
(
  cust_id     number,
  cust_name    varchar2(20),
  credit_limit number(10)
)
organization external
(
  type oracle_loader
  default directory etl_dir
  access parameters
  (
    records delimited by newline
    preprocessor execdir:'preprocess.sh'
    fields terminated by ","
  )
location ('joords_data2.txt.gz') 
);
View line 14 : preprocessor execdir:'preprocess.sh'

Create the Preprocess program on the location /joords/prep_tools => DIRECTORY execdir. file preprocess.sh
#!/bin/bash
cd /joords/etl_dir
/bin/gunzip -c $1
select * from joords_data2;
 
   CUST_ID CUST_NAME            CREDIT_LIMIT
---------- -------------------- ------------
         1 jos van den Oord             1000
         2 Transfer - Solutions         2000
If you select from the external table, the output will be similar to that of the earlier "select * from indata1;" query. The preprocessor passed the uncompressed contents of the joords_data2.txt (compressed) file on to the external table functionality This inline preprocessing unzip example uses a script. This preproccess command must be returning data like a stream. An important aspect is, the utility can’t accept any parameters. So if you pass parameters, you need a script. The preprocessor does not actually need a file but, rather, ir requires the output of the preprocessor program. You can write a preprocessor program to send the output of a dir command. The new pre-processor program, named preproc_dir.sh, has only the following two lines: File preproc_dir.sh
#!sh
ls /joords/etl_dir
create table joords_dir
(
  file_name    varchar2(200)
)
organization external
(
  type oracle_loader
  default directory etl_dir
  access parameters
  (
    records delimited by newline
    preprocessor execdir:'preproc_dir.sh'
  )
location ('joords_data2.txt.gz') 
);
select * from joords_dir;
FILE_NAME
-------------------------
joords_data1.txt
joords_data2.txt.gz
You see that the file in location can be anything that exists on the system where the oracle user can access.

Follow these guidelines to secure the ORACLE_LOADER access driver:

  1. Create a separate operating system directory to store the access driver preprocessors. You (or the operating system manager) may need to create multiple directories if different Oracle Database users will run different preprocessors. If you want to prevent one set of users from using one preprocessor while allowing those users access to another preprocessor, then place the preprocessors in separate directories. If all the users need equal access, then you can place the preprocessors together in one directory. After you create these operating system directories, in SQL*Plus, you can create a directory object for each directory. 
  2. Grant the operating system user ORACLE the correct operating system privileges to run the access driver preprocessor. In addition, protect the preprocessor program from WRITE access by operating system users other than the user responsible for managing the preprocessor program.
  3. Grant the EXECUTE privilege to each user who will run the preprocessor program in the directory object. Do not grant this user the WRITE privilege on the directory object. Never grant users both the EXECUTE and WRITE privilege for directory objects.
  4. Grant the WRITE privilege sparingly to anyone who will manage directory objects that contain preprocessors. This prevents database users from accidentally or maliciously overwriting the preprocessor program.
  5. Create a separate operating system directory and directory object for any data files that are required for external tables. Ensure that these are separate from the directory and directory object used by the access directory preprocessor. Work with the operating system manager to ensure that only the appropriate operating system users have access to this directory. Grant the ORACLE operating system user READ access to any directory that has a directory object with READ privileges granted to database users. Similarly, grant the ORACLE operating system user WRITE access to any directory that has the WRITE privilege granted to database users.
  6. Create a separate operating system directory and directory object for any files that the access driver generates. This includes log files, bad files, and discarded files. You and the operating system manager must ensure that this directory and directory object have the proper protections, similar to those described in Guideline 5. The database user may need to access these files when resolving problems in data files, so you and the operating system manager must determine a way for this user to read those files.
  7. Grant the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges sparingly. Users who have these privileges and users who have been granted the DBA role have full access to all directory objects.
  8. Consider auditing the DROP ANY DIRECTORY privilege.
  9. Consider auditing the directory object.