Oracle SQL and Informatica Performance Tuning: Oracle Optimizer and its Execution Plan

Why SQL Tuning ?

A SQL statement becomes a problem when it fails to perform according to a predetermined and measureable standard.

The 2 (Two) goals in performance tuning:

Reduce user response time, which means decreasing the time between when a user issues a statement and receives a response.
Improve throughput, which means using the least amount of resources necessary to process all rows accessed by a statements.

There are 2 (Two) concepts below:

SOFT parse

During the parse, the database performs a shared pool check to see if an existing parsed statement already exists and then determines whether it can skip resource-intensive steps of statement processing.

Hard parse

If Oracle Database cannot reuse existing code, then it must build a new executable version of the application code. This operation is known as a hard parse.

What is Oracle Cost Based Optimization ?

Oracle cost-based optimizer is designed to determine the most efficient path to access the data stored in the table by SQL statements, it generates an execution plan for a SQL statement, Oracle gathers statistics about the data in the tables and indexes in order to generate the most optimal execution plan; this is known as cost-based optimization. Comparing the execution plans generated by the optimizer will allow to judge the relative cost of one SQL statement versus another. These results can be use to make SQL statements optimal.

To collect the statistics, there are two methods you can use:

The ANALYZE statement can generate statistics for cost-based optimization:

ANALYZE TABLE sh.sales COMPUTE STATISTICS;

Using the DBMS_STATS Package

The PL/SQL package DBMS_STATS lets you generate and manage statistics for cost-based optimization:

exec dbms_stats.gather_table_stats (user,'SALES',cascade=>true);

Execution Plan :

An execution plan is a series of steps occurred in order to execute SQL Statements. see below:

The most complex SQL statement can be broken down into a series of basic operation.
To execute any SQL statement Oracle has to derive an execution plan.
Execution plan contains the series of steps and relationship between them.

SQL Tuning Tools:

AUTOTRACE :

The AUTOTRACE command is SQL*Plus generates the execution plan and statistics
about the performance of a query. This command provides statistics such as disk
reads and memory reads.

EXPLAIN PLAN

This SQL statement enables you to view the execution plan that the optimizer
would use to execute a SQL statement without actually executing the statement.

V$SQL_PLAN

These views contain information about executed SQL statements, and their
execution plans, that are still in the shared pool.
You can use the DBMS_XPLAN package methods to display the execution plan
generated by the EXPLAIN PLAN command and query of V$SQL_PLAN.

Running Explain plan:

EXPLAIN PLAN FOR
SELECT * FROM SH.COUNTRIES
WHERE
COUNTRY_ID = 52790;

SELECT * FROM TABLE (DBMS_XPLAIN.DISPLAY) ;

Types of Table Accesses:

Oracle can retrieve data in a number of ways. The most common techniques for retrieving table data are :

Read the entire table using a full table
Access a specific row using ROWID
Using an Index to locate the rows

TABLE ACCESS FULL:

A full table Scan sequentially reads each row from the table.
Oracle reads multiple blocks during each database read.
If there are million rows in a table, it will read all million rows it is reading table.

TABLE ACCESS BY ROWID:

Oracle reads rows by their ROWID pseudo column.

ROWID contains the physical location of the row.

Oracle uses INDEX to correlate data values with ROWID and help in improving the performance of a query.

The database finds the ROWID that go with these values and follows them to the precise table blocks, and rows within those blocks, that satisfy the query.

TABLE INDEX UNIQUE SCAN:

The scan searches the index in order for the specified key. An index unique scan stops processing as soon as it finds the first record because no second record is possible.
The database obtains the ROWID from the index entry, and then retrieves the row specified by the ROWID.

TABLE INDEX RANGE SCAN:

An index range scan is used when there is a NON UNIQUE index or you are searching for a range of values.

TABLE Choosing between FULL Table scan and Indexed Access:

Indexed reads examine only a small part of each block, the rows you want, instead of every row in the block, saving CPU time.
Indexed reads usually scale better as a table grows, giving stable performance, whereas a full table scan becomes steadily worse,
even while it might start out a little better than the indexed plan.
Index reads are almost always cached.

What should we look from Execution Plan:

Cardinality        - Looks for Estimate of the number of rows coming out of each of the operations.
Access Method - The way in which the data is being accessed, via either a table scan or index access.
Join Method      - The method (e.g., has, sort-merge, etc) used to join tables with each other.
Join Type          - The type of join (e.g., outer, anti semi, etc).
Join Order         - The order in which the tables are joined to each other.
Partition Pruning - Are only the necessary partitions being accessed to answer the query ?
Parallel Execution - In care of parallel execution, is each operation in the plan being conducted in parallel ? Is the right data redistribution method being used?

What is COST ?

The Oracle Optimizer is a cost-based optimizer.
The optimizer's cost model accounts for the IO, CPU, and network resources that will be used by the query.
The Optimizer selects the execution plan with the lowest cost, where cost represents the estimated resource usage for that plan.

SELECT consideration:

The simplest tuning rule is 'select the data only need, don't select all the data'

SELECT * FROM HR.EMPLOYEES E;

SELECT COLUMN_1, COLUMN_2, COLUMN_N
FROM HR.EMPLOYEES E ;

Use Table Aliases:

Adding table aliases and prefixing all column names by their aliases is a good rule to follow.
This is very important when there is more than 1 table involved in the query.
Mentioning table aliases reduces the parse time.

BAD

SELECT
ORDER_ID
, SALES_ID
, TOTAL_AMOUNT
, PRODUCT_NAME
, PRODUCT_CATEGORY
, COLOR

FROM

SALES_HISROTY, PRODUCT

WHERE
PRODUCT_ID = PRODUCT_ID

;

GOOD

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
, P.PRODUCT_CATEGORY
, P.COLOR

FROM

SALES_HISTORY S, PRODUCT P

WHERE
S.PRODUCT_ID = P.PRODUCT_ID

;

Use WHERE rather than HAVING:

HAVING clause filters the rows only after all rows have been fetched, grouped and sorted. So when ever possible try to filter as much as data using WHERE rather than HAVING. So when ever possible try to filter as much as data using WHERE rather than HAVING.
This reduces lot of overhead on the database.

BAD

SELECT
ORDER_ID
, PRODUCT_NAME
, SUM(TOTAL_AMOUNT)
FROM
SALES_HISTORY s
GROUP BY
ORDER_ID
, PRODUCT_NAME
HAVING ORDER_ID = 1267
;

GOOD

SELECT
S.ORDER_ID
, S.PRODUCT_NAME
, SUM(S.TOTAL_AMOUNT) AS TOTAL_AMOUNT
FROM
SALES_HISTORY s
WHERE
S.ORDER_ID = 1267
GROUP BY
S.ORDER_ID
, S.PRODUCT_NAME
;

Use Index Suppression Reasons:

There are many reasons why an Index might not be used even though you have an index on the column you are searching for.

Use of <> operator on an indexed column
Use of SUBSTR function on an indexed column
Use of Arithmetic operators(+,-,/,*) on an indexed column
Use of TRUNC function on an indexed column
Use of || operator on an indexed column
Data type conversion on an indexed column
Use of IS NULL/NOT NULL on an indexed column

Accidental Index Suppression:

Indexes can tell you what is in a table, but not what is not in a table.
!= and <> disables index usage.

BAD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
ORDER_ID <> 12345
;

GOOD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
ORDER_ID > 12345 AND
ORDER_ID < 12345 ;

SUBSTR function disables the index:

BAD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
SUBSTR(PRODUCT_NAME, 1, 3) = ‘SAM’
;

GOOD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
PRODUCT_NAME LIKE ‘SAM%’
;

function - , + , / , * operators disables the indexes:

BAD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
ORDER_ID + 1000 < 5000
;

GOOD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
ORDER_ID < 4000
;

function TRUNC function disables the index:

BAD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
TRUNC(SALES_DATE,'MON') = '01-JAN-2015'
;

GOOD

Use between instead of TRUNC function

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
SALES_DATE BETWEEN '01-JAN-2015' AND '31-JAN-2015'
;

function  || operator disables the index:

BAD

SELECT
*
FROM
PRODUCT
WHERE
COLOR || PRODUCT_CATEGORY = ‘BLACKMobile’ ;

GOOD

Use AND operator

SELECT
*
FROM
PRODUCT
WHERE
COLOR = ‘BLACK’             AND
PRODUCT_CATEGORY = ‘Mobile’ ;

Comparing a character column to a number value:

BAD

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
PRODUCT_NAME = 123 ;

GOOD

Use AND operator

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
PRODUCT_NAME = '123' ;

Function based Indexes:

Where ever possible try to avoid using functions on an indexed column, But if there is a business requirement which you can not avoid, then instead of using functions on the column, try to use function based Indexes.

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
SUBSTR(PRODUCT_NAME, 1, 3 ) = 'SAM' ;

SELECT * FROM TABLE (DBMS_XPLAN.DISPLAY) ;

CREATE INDEX product_s_idx ON SALES_HISROTY (SUBSTR(PRODUCT_NAME,1,3));

Exec dbms_stats.gather_table_stats(user,’SALES_HISTORY’, cascade=>true);

Use UNION instead of OR:

Consider using UNION ALL when you have to use OR condition on 2 index columns.

SELECT
ORDER_ID
, SALES_DATE
, TOTAL_AMOUNT
, PRODUCT_NAME
FROM
SALES_HISTORY s
WHERE
PRODUCT_NAME = 'SAMSUNG' OR
SALES_DATE = '01-JAN-2015' ;

GOOD

SELECT
S.ORDER_ID
,S.SALES_DATE
,S.TOTAL_AMOUNT
,S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.PRODUCT_NAME = 'SAMSUNG'

UNION ALL

SELECT
S.ORDER_ID
,S.SALES_DATE
,S.TOTAL_AMOUNT
,S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.SALES_DATE = '01-JAN-2015' ;

Use UNION ALL instead of UNION:

Avoid Unnecessary sorts !!!
Don’t sort if order is not import.
UNION performs an extra operation called SORT UNIQUE, which is really expensive.

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.PRODUCT_NAME = 'SAMSUNG'

UNION

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
S.SALES_HISTORY S
WHERE
S.SALES_DATE = '01-JAN-2015' ;

Better

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.PRODUCT_NAME = 'SAMSUNG'

UNION ALL

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.SALES_DATE = '01-JAN-2015' ;

Use Function based Indexes:

Where ever possible try to avoid using functions on an indexed column, But if there is a business requirement which you can not avoid, then instead of using functions on the column, try to use function based Indexes.

CREATE INDEX product_s_idx ON SALES_HISROTY (SUBSTR(PRODUCT_NAME,1,3));

Exec dbms_stats.gather_table_stats(user,’SALES_HISTORY’, cascade=>true);

EXPLAIN PLAN FOR
SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
SUBSTR(S.PRODUCT_NAME, 1, 3 ) = ‘SAM’ ;

SELECT * FROM TABLE (DBMS_XPLAN.DISPLAY) ;

Use UNION instead of OR:

Consider using UNION when you have to use OR condition on 2 index columns.

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.PRODUCT_NAME = 'SAMSUNG' OR
S.SALES_DATE = '01-JAN-2015' ;

GOOD

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.PRODUCT_NAME = 'SAMSUNG'

UNION ALL

SELECT
S.ORDER_ID
, S.SALES_DATE
, S.TOTAL_AMOUNT
, S.PRODUCT_NAME
FROM
SALES_HISTORY S
WHERE
S.SALES_DATE = '01-JAN-2015' ;

Use Truncate command instead of Delete:

When ever we delete data from the table, rollback segments are used to hold UNDO information. If you do not commit the transaction, Oracle restores the data to the state it was in before the transaction started.

With TRUNCATE, no UNOD information is generated. Once the table is truncated, the data can not be recovered back. Hence it is faster and needs fewer resources.

So, use TRUNCATE rather than DELETE when you need no UNDO information.

TRUNCATE TABLE SALES_HISTORY;

Issue Frequent COMMIT statements:

If you are writing a PL/SQL block or any other programming block to perform DML operations (Insert, delete, update), try to issue a COMMIT as frequently as possible so than……….
Information held in the rollback segments to UNDO the transaction is freed up.
Locks on the table are released.
Space on the REDO log buffer cache is freed up.

DECLARE

l_dept_id employees.department_id%TYPE :=10;

BEGIN

DELETE FROM EMPLOYEES
WHERE
department_id = l_dept_id ;

COMMIT;

DBMS_OUTPUT.put_line (SQL%ROWCOUNT) ;
UPDATE employees
  SET salary = salary * 1.2
WHERE
department_id = l_dept_id;

COMMIT;

DBMS_OUTPUT.put_line (SQL%ROWCOUNT) ;

INSERT INTO employees ( employee_id
   , last_name
   , department_id
   , salary )
  VALUES (100
   , 'Feuerstein'
   , 10
   , 200000 ) ;

DBMS_OUTPUT.put_line (SQL%ROWCOUNT) ;

COMMIT ;

END;

Bulk Collect:

The bulk processing features of PL/SQL are designed specifically to reduce the number of context switches required to communicate from the PL/SQL engine to the SQL engine.
Use the BULK COLLECT clause to fetch multiple rows into one or more collections with a single context switch.

DECLARE

TYPE two_cols_rt IS RECORD
(
employee_id           employees.employee_id%TYPE,
salary                      employees.salary%TYPE
);

TYPE employee_info_t IS TABLE OF two_cols_rt;

l_employees          employee_info_t;

BEGIN
SELECT
   employee_id, salary
BULK COLLECT INTO l_employees
FROM
employees
WHERE
department_id = 10;

END;

Use EXISTS Rather than IN:

Use IN to check if a value is contained in a list. EXISTS is different from IN: EXISTS just checks for the existence of rows, whereas IN checks actual values. EXISTS typically offers better performance than IN with subqueries. Therefore you should use EXISTS rather than IN whenever possible.

The following query uses IN ( BAD since EXISTS would work ) to retrieve products that have been purchased:

BAD ( Uses IN rather than EXISTS )

SELECT
PRODUCT_ID, NAME
FROM
PRODUCTS
WHERE
PRODUCT_ID IN (
SELECT PRODUCT_ID
FROM
PURCHASES
)
;

The below query rewrites the previous example to use EXISTS:

GOOD ( uses EXISTS rather than IN )

SELECT
PRODUCT_ID, NAME
FROM
PRODUCTS PROD
WHERE
EXISTS (
SELECT
1
FROM
PURCHASES PURC
WHERE
PROD.PRODUCT_ID = PURC.PRODUCT_ID
)
;

Use EXISTS Rather than DISTINCT:

You can suppress the display of duplicate rows using DISTINCT; You use EXISTS to check for the existence of rows returned by a subquery. Whenever possible, you should use EXISTS rather than DISTINCT sorts the retrieved rows before suppressing the duplicate rows.
The following query uses DISTINCT (BAD since EXISTS would work ) to retrieve products that have been purchases:

BAD ( Uses DISTINCT when EXISTS would work )

SELECT DISTINCT
PR.PRODUCCT_ID,
PR.NAME
FROM
PRODUCTS PROD, PRUCHASES PURC
WHERE
PROD.PRODUCT_ID = PURC.PRODUCT_ID;

The below query rewrites the previous example to use EXISTS rather than DISTINCT :

GOOD ( uses EXISTS rather than DISTINCT )

SELECT
PRODUCT_ID,
NAME
FROM
PRODUCTS PROD
WHERE
EXISTS (
SELECT
1
FROM
PURCHASES PURC
WHERE
PROD.PRODUCT_ID = PURC.PRODUCT_ID
)
;

Join Method

There are 3 main join methods in Oracle:

Nested Loop Join
Sort Merge Join
Hash Join

Use Nested Loop Join:

Common join techniques

Starts returning rows quickly

No pre-processing of data

CBO will usually choose smaller table to drive join

Repeated probes done via index

Oracle reads the first row from the first row source and then checks the second row source for matches.
All matches are then placed in the result set and Oracle goes on to the next row from the first row source.
This continues until all rows in the first row source have been processed.
The first row source is often called the outer table or driving table, while the second row source is called the inner table.
Smaller result set should be the driving table
Larger result set should be the inner table

When to use ?

Driving row source is small and the joined columns of the inner row source are uniquely indexed
This is ideal for query screens where an end-user can read the first few records retrieved while the rest are being fetched.
Typically for OLTP queries.

When not to use ?

Nested loops joins can be very inefficient if the inner row source does not have an index on the joined columns or if the index is not highly selective.

Use Hash Join:

Hash joins are used for joining large data sets ( data warehouse, batch ).
The optimizer uses the smaller of the two tables to build a hash table, based on the join key, in memory.
It then scans the larger table, and performs the same hashing algorithm on the join columns(s).
It then probes the previously built hash table for each value and if they match, it returns a row.
Needs 1 mg + hash area size.
Works only for Equi Join.

Oracle SQL and Informatica Performance Tuning

Wednesday, August 10, 2016

Oracle Optimizer and its Execution Plan

Using the DBMS_STATS Package

2 comments: