4 Apr 2012

ETL Vs ELT


ETL and ELT are two different approaches in Data warehouse design and implementations. There are three stages in the process of building a data warehouse commonly accepted by the industry. However, the order in which these stages are performed can have a fundamental effect on the meaning of these processes and also the overall outcome of the BI solution. Extract – The process by which data is extracted from the data source Transform – The transformation of the source data into a format relevant to the solution Load – The loading of data into the warehouse.

what is the difference between ETL and ELT ?
In ETL the transformations are processed by the ETL tools while in ELT the transformations are processed by the target data sources (RDBMS). So the important thing to remember is the side where the transformations are processed.

ETL – Extract, Transform and Load
This is the traditional approach. Data is “Extracted” from the data sources using a data extraction tool. It is then “Transformed” using a series of transformation routines. This transformation process is largely dictated by the data format of the output. Data quality and integrity checking is performed as part of the transformation process, and corrective actions are built into the process. Transformations and integrity checking are performed in the data staging area. Finally, once the data is in the target format, it is then “Loaded” into the data warehouse ready for presentation.  

 It is made up of software that transforms (with its own inbuilt routines and functions) and migrates data on most platforms with or without source and target databases

ELT – Extract, Load and Transform
Data is “extracted” from the data sources into the “Staging Database” using a data extraction tool. Whilst in the staging area, integrity and business rules checks can be applied, and relevant corrections can be made. The source data is then loaded into the warehouse. In effect this provides a validated and cleaned offline copy of the source data in the data warehouse. Once in the warehouse, transformations are performed to re-shape the data into its target output format.  It consists of software that transforms and migrates data in a database engine, often by generating SQL statements and procedures and moving data between tables.

A key appeal of ELT is loading performance, an advantage it holds because it operates at the infrastructure level, using SQL and related procedural features of the relational database). In contrast, ETL works at the integration server level, using dedicated graphical application development facilities including transformation rules.



Some Big names in the domain:

Oracle has been firmly on the side of ELT and is upping the ante with appliances like Oracle/Sun Exadata. Oracle Data Integrator Enterprise Edition (ODI-EE) is a unified solution comprising Oracle Warehouse Builder and Oracle Data Integrator (ODI). 

Informatica, the pure-play data integrator and a traditional ETL player, now offers the option of deploying the transformation logic to either the Informatica engine (ETL) or the DBMS (ELT), or a hybrid (ETLT).

IBM’s InfoSphere DataStage is an ETL tool that can also be deployed in an ETLT style 

3 Apr 2012

Multiple RPD on single BI Server


Now let’s assume that the two RPD i.e deba.rpd & test.rpd and their respective two catalogs both working properly while loaded individually and working independently .
Configuring the NQSCONFIG:
Lets configure the NQSCONFIG.ini file as below :
[ REPOSITORY ]
RPD1  =       deba.rpd , DEFAULT;
RPD2    =     test.rpd ;
Configuring the DSN:
Create 2 Oracle BI Server System DSN i.e
DSN1 to point to default repository deba.rpd, and
DSN2 to point to test.rpd.
Since we are configuring two RPD in parallel to single BI server instance hence  for both DSN we point to server Local with default Port 9703 . The 9703 port is ODBC communication port between BI Server and Presentation Services .
 N.B : whatever repository will be put under change Default repository section will be override by the DEFAULT settings in NQSCONFIG.ini . You can tick “Connect to Oracle BI Server to obtain default settings” ,so that RPD connection will be established as per the definition in .ini file after BI server restart .
Configuring the Catalog and instanceconfig.xml:
For both deba.rpd and test.rpd we have catalog file under ../$OracleBIDataHome/web/catalog/” as “deba” and “test” having different sets of reports in each of them to distinguish properly. So let’s make catalog “deba” online after making the below changes in instanceconfig.xml .
<WebConfig>
<ServerInstance>
   <DSN>DSN1</DSN>
   <CatalogPath>D:/OracleBIData/web/catalog/deba</CatalogPath>
Restart the services and log into catalog for “deba” and you can verify that the query running on deba.rpd and your dashboard display the reports relevant for catalog “deba” .
Edit the xml and make the similar type of change to point to DSN2 and catalog “test” .It should work as well pointing to repo “test” .