Minimum Tools Necessary for Troubleshooting
The bare minimum tools necessary for troubleshooting database related issues are as following
AWR Report
AWR (Automatic Workload Repository) report is a Oracle tool which provides a comprehensive list of database activities between a specified interval of time.
· To get the AWR report take the snapshots before and after the potential bottleneck.
SQL> exec dbms_workload_repository.create_snapshot ;
· After taking the AWR snapshot run the below command to take the snap_id (that will be used to generate the report)
SQL> select max(snap_id) from dba_hist_snapshot;
· Generate the AWR report using the following
sqlplus
ADDM Report
ADDM (Automatic Database Diagnostic Monitor) report is an Oracle tool which provides upfront recommendations of database activities between a specified interval of time. For each of the identified issues it locates the root cause and provides recommendations for correcting the problem. An ADDM analysis task is performed and its findings and recommendations stored in the database every time an AWR snapshot is taken provided the STATISTICS_LEVEL parameter is set to TYPICAL or ALL. ADDM report can be execute by as following
· Take a snapshot before and after the potential bottleneck.
SQL> exec dbms_workload_repository.create_snapshot ;
· After taking the AWR snapshot run the below command to take the snap_id (that will be used to generate the report)
SQL> select max(snap_id) from dba_hist_snapshot;
· Generate the ADDM report using the following
sqlplus
Statistics Collection
Make sure that statistics are current on the database and gathered using DBMS_STATS package as mentioned below
SQL> alter session enable parallel dml;
SQL> exec dbms_stats.gather_schema_stats(ownname => '
Explain Plan of SQL statement
If the issue is related to a slow down of a particular pre-identified SQL statement then we need to get the SQL plan from the production system to make sure the right access path is taken. To do this the following three options are there.
· If you have already identified the SQL_ID of the SQL then it will be just a matter of querying the V$SQL_PLAN table to get the explain plan.
· The other way is to get the explain plan using the following way on sqlplus SQL> explain plan for < sql statement > ; SQL> SELECT PLAN_TABLE_OUTPUT FROM
TABLE (DBMS_XPLAN.DISPLAY ());
· The other way to get the explain plan is using Tkprof. Please refer to the “Requesting Tkprof” section to know the different kinds of Tkprofs.
To troubleshoot the SQL plan please refer to the section “Troubleshooting Bad SQL Plan”
Troubleshooting Bad SQL Plan
Troubleshooting Bad SQL Plan can be done in following ways
Cross check all base provided Indexes
The following SQL can be executed to give a comprehensive list of Indexes for the tables in question.
SQL> select distinct a.table_OWNER, c.table_name,a.index_owner,a.index_name,c.num_rows Table_Rows ,b.num_rows Index_Rows,b.sample_size,a.table_owner,a.column_name,a.column_position ,b.uniqueness,b.last_analyzed
from dba_ind_columns a, dba_indexes b, dba_tables c
Where A.Index_Name = B.Index_Name
And A.Index_Owner = B.Owner
And A.Table_Owner =B.Table_Owner
AND B.TABLE_OWNER = C.OWNER
And B.Table_Name = C.Table_Name
and a.table_name = b.table_name and a.table_name in (‘< List of Table Names>’)
order by a.table_owner,a.index_name,a.column_position
Full scans
Make sure that there is no full scans in the explain plan
Table access + Index access instead of Index only access
In almost all the cases if the SQL can make the Index only access then that is the best plan but that is subjected to the Index feasibility.
Clustering Factor of the Indexes
Make sure that that the clustering factor of the Indexes is good (which means close to the number of blocks then to the number of rows). Good clustering factor of Indexes means the rows are nicely packed in the blocks and the SQL query result can result from few numbers of blocks which ultimately leads to less CPU and IO overhead.
Indexes with Low Cardinality
Indexes which have low cardinality are likely to cause scalability issues. These Indexes leads to buffer busy waits when large numbers of threads are trying to obtain the lock on the same block. To mitigate this, Indexes with Low Cardinality Indexes should be avoided.
Database Optimizer Parameters for Indexes
There are three optimizer parameters which affect SQL plan in terms of giving preference to Index access in the SQL plan. As our application is mostly Index driven so make sure that these parameters are appropriately set. Please note that these are the recommended starting values but customer can adjust this to values that give maximum performance gains. These parameters are
· optimizer_index_cost_adj - This should be set to 1
· optimizer_index_caching - This should be set to 100.
· optimizer_mode - This should be ALL_ROWS from batch and FIRST_ROWS_10 for online
Requesting Tkprofs
Tkprofs are good source for troubleshooting customer’s issues as they provide information about wait events on SQL level, provide list of all SQLs running during the duration when you are tracing the sessions and also gives a run time explain plan. There are multiple ways to get the Tkprofs
Basic Tkprofs
To enable the Tkprofs set the below before you start the test
SQL> ALTER SYSTEM SET trace_enabled = TRUE ;
SQL> ALTER SYSTEM SET sql_trace = TRUE ;
SQL> ALTER SYSTEM SET timed_statistics=TRUE;
SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 2';
Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.
Once the test to troubleshoot the issue is completed run the following to stop the trace
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';
SQL> ALTER SYSTEM SET sql_trace = FALSE;
SQL> ALTER SYSTEM SET trace_enabled = FALSE;
Concatenate the *ora* files and take the Tkprof as below
tkprof
Tkprofs with Bind Variables
To enable the Tkprofs to get the bind variables set the below before you start the test
SQL> ALTER SYSTEM SET trace_enabled = TRUE ;
SQL> ALTER SYSTEM SET sql_trace = TRUE ;
SQL> ALTER SYSTEM SET timed_statistics=TRUE;
SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 4';
Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.
Once the test to troubleshoot the issue is completed run the following to stop the trace
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';
SQL> ALTER SYSTEM SET sql_trace = FALSE;
SQL> ALTER SYSTEM SET trace_enabled = FALSE;
Concatenate the *ora* files and take the Tkprof as below
tkprof
The concatenated trace file provides the bind variables used in the sessions.
Tkprofs with Wait Events
To enable the Tkprofs to get the wait events set the below before you start the test
SQL> ALTER SYSTEM SET trace_enabled = TRUE ;
SQL> ALTER SYSTEM SET sql_trace = TRUE ;
SQL> ALTER SYSTEM SET timed_statistics=TRUE;
SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 8';
Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.
Once the test to troubleshoot the issue is completed run the following to stop the trace
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';
SQL> ALTER SYSTEM SET sql_trace = FALSE;
SQL> ALTER SYSTEM SET trace_enabled = FALSE;
Concatenate the *ora* files and take the Tkprof as below
tkprof
Tkprofs with Bind Variables and Wait Events
To enable the Tkprofs to get the bind variables and wait events set the below before you start the test
To enable the Tkprofs to get the wait events set the below before you start the test
SQL> ALTER SYSTEM SET trace_enabled = TRUE ;
SQL> ALTER SYSTEM SET sql_trace = TRUE ;
SQL> ALTER SYSTEM SET timed_statistics=TRUE;
SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 12';
Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.
Once the test to troubleshoot the issue is completed run the following to stop the trace
SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';
SQL> ALTER SYSTEM SET sql_trace = FALSE;
SQL> ALTER SYSTEM SET trace_enabled = FALSE;
Concatenate the *ora* files and take the Tkprof as below
tkprof
The concatenated trace file provides the bind variables used in the session also.
AWR Analysis for Troubleshooting
AWR Report has a comprehensive list of data which is helpful in troubleshooting a database issue. There are few points which I want to highlight.
Top 5 Timed Foreground wait events
The top 5 timed foreground wait events is the first place to start the troubleshoot as it gives a good overview of potential issues.
SQL Statistics
This section of the AWR report gives a very comprehensive list of SQLs ordered by elapsed time, CPU, physical disk reads, consistent gets and executions. This is very helpful to see SQL statistics from these different perspectives.
Instance Activities
This section is helpful to see the Instance activities from per sec and per transaction level.
IO Stats
This section is helpful to see if there are issues related to IO. This helps us to visualize the IO bottleneck from the Tablespace as well as File level.
Advisory Pools
This section is helpful to see if the different pool s are sized properly and also gives an idea if they are undersized.
Wait Stats
This section is helpful to see the wait statistics.
Segment Statistics
This section is helpful to see issues related to the segment level and is one of the most important section for troubleshooting.
Database Initialization Parameters
The following database initialization parameters are good to start with and are recommended by the product. These can be further tuned based on the Implementation.
· processes=3000
· sessions=4500
· sga_max_size= < set according to physical memory available >
· sga_target= < set according to physical memory available >
· pga_aggregate_target=
available >
· memory_target=0 < DO NOT SET THIS >
· memory_max_target=0 < DO NOT SET THIS >
· db_writer_processes=12
· db_block_size=8192
· log_buffer=250M
· log_checkpoint_interval=0
· db_file_multiblock_read_count=8
· dml_locks=4860
· transactions=3000
· undo_retention=900
· sec_case_sensitive_logon=FALSE
· session_cached_cursors=0
· parallel_min_servers=32
· parallel_max_servers=256
· open_cursors=3000
· optimizer_index_cost_adj=1
· optimizer_index_caching=100
No comments:
Post a Comment