Datascience ML and AI: June 2015

Monday 29 June 2015

Router Transformation

Router Transformation:

Active and connected transformation.

A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the Condition. However, a Router transformation tests data for one or more conditions And gives you the option to route rows of data that do not meet any of the conditions to a default output group.

Example: If we want to keep employees of France, India, US in 3 different tables, then we can use 3 Filter transformations or 1 Router transformation.

Mapping A uses three Filter transformations while Mapping B produces the same result with one Router transformation.

A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that we configure in the Designer.

Working with Groups

A Router transformation has the following types of groups:

Input: The Group that gets the input ports.
Output: User Defined Groups and Default Group. We cannot modify or delete Output ports or their properties.

User-Defined Groups: We create a user-defined group to test a condition based on incoming data. A user-defined group consists of output ports and a group filter Condition. We can create and edit user-defined groups on the Groups tab with the Designer. Create one user-defined group for each condition that we want to specify.

The Default Group: The Designer creates the default group after we create one new user-defined group. The Designer does not allow us to edit or delete the default group. This group does not have a group filter condition associated with it. If all of the conditions evaluate to FALSE, the IS passes the row to the default group.

Example: Filtering employees of Department 10 to EMP_10, Department 20 to EMP_20 and rest to EMP_REST

Source is EMP Table.
Create 3 target tables EMP_10, EMP_20 and EMP_REST in shared folder. Structure should be same as EMP table.
Create the shortcuts in your folder.

Creating Mapping:

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give mapping name. Ex: m_router_example

4. Drag EMP from source in mapping.

5. Click Transformation -> Create -> Select Router from list. Give name and

Click Create. Now click done.

6. Pass ports from SQ_EMP to Router Transformation.

7. Edit Router Transformation. Go to Groups Tab

8. Click the Groups tab, and then click the Add button to create a user-defined Group. The default group is created automatically.

9. Click the Group Filter Condition field to open the Expression Editor.

10. Enter a group filter condition. Ex: DEPTNO=10

11. Click Validate to check the syntax of the conditions you entered.

12. Create another group for EMP_20. Condition: DEPTNO=20

13. The rest of the records not matching the above two conditions will be passed to DEFAULT group. See sample mapping

14. Click OK -> Click Apply -> Click Ok.

15. Now connect the ports from router to target tables.

16. Click Mapping -> Validate

17. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table.
Make sure to give connection information for all 3 target tables.

Sample Mapping:

Difference between Router and Filter :

We cannot pass rejected data forward in filter but we can pass it in router. Rejected data is in Default Group of router.

Questions:

1. What is a router transformation?

A router is used to filter the rows in a mapping. Unlike filter transformation, you can specify one or more conditions in a router transformation. Router is an active transformation.

2. How to improve the performance of a session using router transformation?

Use router transformation in a mapping instead of creating multiple filter transformations to perform the same task. The router transformation is more efficient in this case. When you use a router transformation in a mapping, the integration service processes the incoming data only once. When you use multiple filter transformations, the integration service processes the incoming data for each transformation.

3. What are the different groups in router transformation?

The router transformation has the following types of groups:

Input
Output

4. How many types of output groups are there?

There are two types of output groups:

User-defined group
Default group

5. Where you specify the filter conditions in the router transformation?

You can create the group filter conditions in the groups tab using the expression editor.

6. Can you connect ports of two output groups from router transformation to a single target?

No. You cannot connect more than one output group to one target or a single input group transformation.

Saturday 27 June 2015

Filter Transformation

Filter Transformation:

Active and connected transformation.

We can filter rows in a mapping with the Filter transformation. We pass all the rows from a source transformation through the Filter transformation, and then enter a Filter condition for the transformation. All ports in a Filter transformation are input/output and only rows that meet the condition pass through the Filter Transformation.

Example: to filter records where SAL>2000

Import the source table EMP in Shared folder. If it is already there, then don’t Import.
In shared folder, create the target table Filter_Example. Keep all fields as in EMP table.
Create the necessary shortcuts in the folder.

Creating Mapping:

Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping -> Create -> Give mapping name. Ex: m_filter_example
Drag EMP from source in mapping.
Click Transformation -> Create -> Select Filter from list. Give name and click Create. Now click done.
Pass ports from SQ_EMP to Filter Transformation.
Edit Filter Transformation. Go to Properties Tab
Click the Value section of the Filter condition, and then click the Open button.
The Expression Editor appears.
Enter the filter condition you want to apply.
Click Validate to check the syntax of the conditions you entered.
Click OK -> Click Apply -> Click Ok.
Now connect the ports from Filter to target table.
Click Mapping -> Validate
Repository -> Save

Create Session and Workflow as described earlier. Run the workflow and see the data in target table.

How to filter out rows with null values?

To filter out rows containing null values or spaces, use the ISNULL and IS_SPACES Functions to test the value of the port. For example, if we want to filter out rows that Contain NULLs in the FIRST_NAME port, use the following condition:

IIF (ISNULL (FIRST_NAME), FALSE, TRUE)

This condition states that if the FIRST_NAME port is NULL, the return value is FALSE and the row should be discarded. Otherwise, the row passes through to the next Transformation.

Performance tuning:

Filter transformation is used to filter off unwanted fields based on conditions we Specify.

Use filter transformation as close to source as possible so that unwanted data gets Eliminated sooner.
If elimination of unwanted data can be done by source qualifier instead of filter,Then eliminate them at Source Qualifier itself.
Use conditional filters and keep the filter condition simple, involving TRUE/FALSE or 1/0.

Questions:

1. What is a filter transformation?

A filter transformation is used to filter out the rows in mapping. The filter transformation allows the rows that meet the filter condition to pass through and drops the rows that do not meet the condition. Filter transformation is an active transformation.

2. Can we specify more than one filter condition in a filter transformation?

We can only specify one condition in the filter transformation. To specify more than one condition, we have to use router transformation?

3. In which case a filter transformation acts as passive transformation?

If the filter condition is set to TRUE, then it passes all the rows without filtering any data. In this case, the filter transformation acts as passive transformation.

4. Can we concatenate ports from more than one transformation into the filter transformation?

No. The input ports for the filter must come from a single transformation.

5. How to filter the null values and spaces?

Use the ISNULL and IS_SPACES functions
Example: IIF(ISNULL(commission),FALSE,TRUE)

6. How session performance can be improved by using filter transformation?

Keep the filter transformation as close as possible to the sources in the mapping. This allows the unwanted data to be discarded and the integration service processes only the required rows. If the source is relational source, use the source qualifier to filter the rows.

7. What is the difference between Router and Filter?

Router	Filter
Router transformation divides the incoming records into multiple groups based on some condition. Such groups can be mutually inclusive (Different groups may contain same record)	Filter transformation restricts or blocks the incoming record set based on one given condition.
Router transformation itself does not block any record. If a certain record does not match any of the routing conditions, the record is routed to default group	Filter transformation does not have a default group. If one record does not match filter condition, the record is blocked
Router acts like CASE.. WHEN statement in SQL (Or Switch().. Case statement in C)	Filter acts like WHERE condition is SQL.

Friday 26 June 2015

Aggregator Transformation

Transformation - Aggregator

Aggregator is an active transformation. Output from the aggregator can be different from input. Designer allows aggregator functions only in this transformation. Following types of function can be used

1. MIN

2. MAX

3. AVG

4. COUNT

5. FIRST

6. LAST

7. MEDIAN

8. PERCENTILE

9. STDDEV

10. SUM

11. VARIANCE

Along with these aggregate functions, you can use other row level functions such as IIF, DECODE etc.

In Aggregator transformation, at least one port has to be selected as group by column. By default, aggregator will return the last value for a port ( if there are more than one record for group by column). Aggregator will also sort the data in ASC order on group by port.

NOTE: If primary column of the source is used in group by port, then aggregator will work as sorter transformation.

Nested Aggregate ports cannot be used in Aggregator. Means, you cannot get the count(*) in one port and use this value in other Aggregator port. This will invalidate the mapping.

Aggregator has a property SORTED INPUT. If you check this property, then aggregator assumes that data is coming in sorted order (on group by ports). If not, at run time session will fail. Sorted Input improves the aggregator performance.

Aggregator Transformation in Informatica

Aggregator transformation is an active transformation used to perform calculations such as sums, averages, counts on groups of data. The integration service stores the data group and row data in aggregate cache. The Aggregator Transformation provides more advantages than the SQL, you can use conditional clauses to filter rows.

Creating an Aggregator Transformation:

Follow the below steps to create an aggregator transformation

Go to the Mapping Designer, click on transformation in the toolbar -> create.

Select the Aggregator transformation, enter the name and click create. Then click done. This will create an aggregator transformation without ports.

To create ports, you can either drag the ports to the aggregator transformation or create in the ports tab of the aggregator.

Configuring the aggregator transformation:

You can configure the following components in aggregator transformation

Aggregate Cache: The integration service stores the group values in the index cache and row data in the data cache.

Aggregate Expression: You can enter expressions in the output port or variable port.

Group by Port: This tells the integration service how to create groups. You can configure input, input/output or variable ports for the group.

Sorted Input: This option can be used to improve the session performance. You can use this option only when the input to the aggregator transformation in sorted on group by ports.

Properties of Aggregator Transformation:

The below table illustrates the properties of aggregator transformation

Property	Description
Cache Directory	The Integration Service creates the index and data cache files.
Tracing Level	Amount of detail displayed in the session log for this transformation.
Sorted Input	Indicates input data is already sorted by groups. Select this option only if the input to the Aggregator transformation is sorted.
Aggregator Data Cache Size	Default cache size is 2,000,000 bytes. Data cache stores row data.
Aggregator Index Cache Size	Default cache size is 1,000,000 bytes. Index cache stores group by ports data
Transformation Scope	Specifies how the Integration Service applies the transformation logic to incoming data.

Group By Ports:

The integration service performs aggregate calculations and produces one row for each group. If you do not specify any group by ports, the integration service returns one row for all input rows. By default, the integration service returns the last row received for each group along with the result of aggregation. By using the FIRST function, you can specify the integration service to return the first row of the group.

Aggregate Expressions:

You can create the aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. You can use the following aggregate functions in the Aggregator transformation,

AVG

COUNT

FIRST

LAST

MAX

MEDIAN

MIN

PERCENTILE

STDDEV

SUM

VARIANCE

Examples: SUM(sales), AVG(salary)

Nested Aggregate Functions:

You can nest one aggregate function within another aggregate function. You can either use single-level aggregate functions or multiple nested functions in an aggregate transformation. You cannot use both single-level and nested aggregate functions in an aggregator transformation. The Mapping designer marks the mapping as invalid if an aggregator transformation contains both single-level and nested aggregate functions. If you want to create both single-level and nested aggregate functions, create separate aggregate transformations.

Examples: MAX(SUM(sales))

Conditional clauses:

You can reduce the number of rows processed in the aggregation by specifying a conditional clause.

Example: SUM(salary, slaray>1000)

This will include only the salaries which are greater than 1000 in the SUM calculation.

Non Conditional clauses:

You can also use non-aggregate functions in aggregator transformation.

Example: IIF( SUM(sales) <20000, SUM(sales),0)

Note: By default, the Integration Service treats null values as NULL in aggregate functions. You can change this by configuring the integration service.

Incremental Aggregation:

After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.

Sorted Input:

You can improve the performance of aggregator transformation by specifying the sorted input. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group. If you specify the sorted input option without actually sorting the data, then integration service fails the session.

Connected and Active Transformation

The Aggregator transformation allows us to perform aggregate calculations, such as averages and sums.

Aggregator transformation allows us to perform calculations on groups.

Components of the Aggregator Transformation

1. Aggregate expression

2. Group by port

3. Sorted Input

4. Aggregate cache

1) Aggregator Index Cache:

The index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO

2) Aggregator Data Cache:

DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.

Columns in Data Cache:

Variable ports if any
Non group by input/output ports.
Non group by input ports used in non-aggregate output expression.
Port containing aggregate function

1) Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table.

EMP will be source table.
Create a target table EMP_AGG_EXAMPLE in target designer. Table should contain DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
Create the shortcuts in your folder.

Creating Mapping:

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give mapping name. Ex: m_agg_example

4. Drag EMP from source in mapping.

5. Click Transformation -> Create -> Select AGGREGATOR from list. Give name and click Create. Now click done.

6. Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.

7. Edit AGGREGATOR Transformation. Go to Ports Tab

8. Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL,

OUT_SUM_SAL

9. Open Expression Editor one by one for all output ports and give the

calculations. Ex: MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)

10. Click Apply -> Ok.

11. Drag target table now.

12. Connect the output ports from Rank to target table.

13. Click Mapping -> Validate

14. Repository -> Save

Create Session and Workflow as described earlier. Run the Workflow and see the data in target table.
Make sure to give connection information for all tables.

Do's and Dont's while using Sorted Input in Aggregator transformation

In general, follow this check list to ensure that you are handling aggregator with sorted inputs correctly:

1. Do not use sorted input if any of the following conditions are true:

a.The aggregate expression uses nested aggregate functions.

b.The session uses incremental aggregation.

c. Input data is data driven.

You select data driven for the Treat Source Rows as Session Property, or the Update Strategy transformation appears before the Aggregator transformation in the mapping.

If you use sorted input under these circumstances, the Informatica Server reverts to default aggregate behavior, reading all values before performing aggregate calculations.

Remember, when you are using sorted inputs, you pass sorted data through the Aggregator.

Data must be sorted as follows:

1. By the Aggregator group by ports, in the order they appear in the Aggregator transformation.

2. Using the same sort order configured for the session.

If data is not in strict ascending or descending order based on the session sort order, the Informatica Server fails the session.

Questions:

1. What is aggregator transformation?

Aggregator transformation performs aggregate calculations like sum, average, count etc. It is an active transformation, changes the number of rows in the pipeline. Unlike expression transformation (performs calculations on a row-by-row basis), an aggregator transformation performs calculations on group of rows.

2. What is aggregate cache?

The integration service creates index and data cache in memory to process the aggregator transformation and stores the data group in index cache, row data in data cache. If the integration service requires more space, it stores the overflow values in cache files.

3. How can we improve performance of aggregate transformation?

Use sorted input: Sort the data before passing into aggregator. The integration service uses memory to process the aggregator transformation and it does not use cache memory.

Filter the unwanted data before aggregating.

Limit the number of input/output or output ports to reduce the amount of data the aggregator transformation stores in the data cache.

4. What are the different types of aggregate functions?

The different types of aggregate functions are listed below:

AVG

COUNT

FIRST

LAST

MAX

MEDIAN

MIN

PERCENTILE

STDDEV

SUM

VARIANCE

5. Why cannot you use both single level and nested aggregate functions in a single aggregate transformation?

The nested aggregate function returns only one output row, whereas the single level aggregate function returns more than one row. Since the numbers of rows returned are not same, you cannot use both single level and nested aggregate functions in the same transformation. If you include both the single level and nested functions in the same aggregator, the designer marks the mapping or mapplet as invalid. So, you need to create separate aggregator transformations.

6. Up to how many levels, you can nest the aggregate functions

We can nest up to two levels only.

Example: MAX( SUM( ITEM ) )

7. What is incremental aggregation?

The integration service performs aggregate calculations and then stores the data in historical cache. Next time when you run the session, the integration service reads only new data and uses the historical cache to perform new aggregation calculations incrementally.

8. Why cannot we use sorted input option for incremental aggregation

In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical cache the data need not be in sorted order. If you give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order. That is why this option is not allowed.

9. How the NULL values are handled in Aggregator

You can configure the integration service to treat null values in aggregator functions as NULL or zero. By default the integration service treats null values as NULL in aggregate functions.

10. What can we do to improve the performance of Informatica Aggregator Transformation?

Aggregator performance improves dramatically if records are sorted before passing to the aggregator and sorted input option under aggregator properties is checked. The record set should be sorted on those columns that are used in Group By operation.

It is often a good idea to sort the record set in database level. e.g. inside a source qualifier transformation, unless there is a chance that already sorted records from source qualifier can again become unsorted before reaching aggregator.

11. Under what condition selecting Sorted Input in aggregator may fail the session

If the input data is not sorted correctly, the session will fail.

Also if the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.

12. What is aggregate cache in aggregator transformation?

The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space, it stores overflow values in cache files.

Pages

Monday 29 June 2015

Router Transformation

Saturday 27 June 2015

Filter Transformation

7. What is the difference between Router and Filter?

Friday 26 June 2015

Aggregator Transformation