Saturday, August 29, 2009

Regular Expression

There is some interesting complex string manipulations that cannot be solved easily by using BI EE specific functions. Just to illustrate the requirement, take a look at the screenshot below



As you see, the above is a simple BI EE report containing the customer related details. But the major problem with this report is that, it contains all the details of a customer in a comma separated format. Basically the source application has inserted the data into this table in such a fashion. One key aspect to this is the fact that the comma delimited customer details follow a specific pattern. That is the details column is nothing but a concatenation of the below customer attributes in the same order

1. Customer First Name
2. Customer Last Name
3. Customer Gender
4. Customer Email Address
5. Customer Phone Number
6. Customer Street Address
7. Customer City
8. Customer State
9. Customer Country

The requirement is to get a report containing the Customer ID, Country ID, Customer Phone Number and the Customer Email address. Normally the best place for handling this would be in the ETL stage wherein individual attributes can be split into multiple columns and then used for reporting. Unfortunately in most cases we would have to work with what we have. To do this in the reporting layer requires some complex INSTR, SUBSTR logic. Today we shall another approach for handling such requirements. Instead of using normal string manipulation functions, we shall be using a new feature of Oracle 10g called as the Regular Expressions.

Regular Expressions or RegEx make string manipulations easy and in fact provides quite a few advanced formatting, filtering options. The regular expressions come as 4 different types of functions

1. REGEXP_SUBSTR – Advanced Sub-String function
2. REGEXP_INSTR – Advanced String position evaluation function
3. REGEXP_LIKE – For doing advanced filtering on a result set
4. REGEXP_REPLACE – For doing advanced multiple-iterative string replace

In order to achieve the above requirement, we need to use these database functions. And of course, we need to use EVALUATE to call these database functions. To split the above comma separate customer attributes, we need to use RegEx functions given below

Customer Phone – REGEXP_REPLACE(CUSTOMER_DETAILS,’(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)’,'\4′)

Customer Email – REGEXP_REPLACE(CUSTOMER_DETAILS,’(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)’,'\3′)

Now use EVALUATE and pass down the above RegEx functions back to the database.



If you look at the report now, you should basically have the individual attributes in a separate column as shown below.



The above was one requirement. Lets look at another requirement now. In some cases, when you are reporting against transactional applications, you would have columns which get entered as free flowing text. The requirement is to classify these columns into 3 types. One is numeric, the other is free text without numbers and the third is alphanumeric. The requirement is to basically give a provision to the end users to choose a specific type and then that column should only the corresponding text i.e. for example if numeric is chosen only the text where numbers were entered should be displayed. In normal sql terms, filtering for numerals and alpha-numerals require separate logic and is complicated as well. We shall see how we can achieve this using RegEx.

Lets first take a look at the report


Now, lets create a prompt which will produce just 3 values (Numeric, Alphabets, Alpha-Numeric).



Add the follwing code
SELECT 'Numeric' FROM RegEx WHERE CUSTOMERS_DETAILS.COMMENTS IS NOT NULL
UNION ALL
SELECT 'Alphabets' FROM RegEx WHERE CUSTOMERS_DETAILS.COMMENTS IS NOT NULL
UNION ALL
SELECT 'Alpha-Numeric' FROM RegEx WHERE CUSTOMERS_DETAILS.COMMENTS IS NOT NULL

Make this prompt to set a presentation variable called AlphaNum.



Now go to the original report and create a filter as shown below

CASE WHEN '@{AlphaNum}{Numeric}' = 'Numeric'
THEN EVALUATE('REGEXP_SUBSTR(%1,''^\d+$'')',CUSTOMERS_DETAILS.COMMENTS)
WHEN '@{AlphaNum}{Numeric}' = 'Alphabets'
THEN EVALUATE('REGEXP_SUBSTR(%1,''^[a-zA-Z]+$'')',CUSTOMERS_DETAILS.COMMENTS)
ELSE EVALUATE('REGEXP_SUBSTR(%1,''^[a-zA-Z0-9]+$'')',CUSTOMERS_DETAILS.COMMENTS) END IS NOT NULL



Now include this report in the main dashboard along with the prompt.





As you see, the RegEx functions are pretty powerful. And of course, with the advent of EVALUATE, we now have the capability to leverage them from BI EE itself. This saves time and effort in implementing complex string based logics.