Thursday, November 10, 2011

TQ-Oracle DB tuning basics

Minimum Tools Necessary for Troubleshooting

The bare minimum tools necessary for troubleshooting database related issues are as following

AWR Report

AWR (Automatic Workload Repository) report is a Oracle tool which provides a comprehensive list of database activities between a specified interval of time.

· To get the AWR report take the snapshots before and after the potential bottleneck.

SQL> exec dbms_workload_repository.create_snapshot ;

· After taking the AWR snapshot run the below command to take the snap_id (that will be used to generate the report)

SQL> select max(snap_id) from dba_hist_snapshot;

· Generate the AWR report using the following

sqlplus / @$ORACLE_HOME/rdbms/admin/awrrpt.sql

ADDM Report

ADDM (Automatic Database Diagnostic Monitor) report is an Oracle tool which provides upfront recommendations of database activities between a specified interval of time. For each of the identified issues it locates the root cause and provides recommendations for correcting the problem. An ADDM analysis task is performed and its findings and recommendations stored in the database every time an AWR snapshot is taken provided the STATISTICS_LEVEL parameter is set to TYPICAL or ALL. ADDM report can be execute by as following

· Take a snapshot before and after the potential bottleneck.

SQL> exec dbms_workload_repository.create_snapshot ;

· After taking the AWR snapshot run the below command to take the snap_id (that will be used to generate the report)

SQL> select max(snap_id) from dba_hist_snapshot;

· Generate the ADDM report using the following

sqlplus / @$ORACLE_HOME/rdbms/admin/addmrpt.sql

Statistics Collection

Make sure that statistics are current on the database and gathered using DBMS_STATS package as mentioned below

SQL> alter session enable parallel dml;

SQL> exec dbms_stats.gather_schema_stats(ownname => '', estimate_percent => 100, cascade => TRUE, degree => , granularity => 'ALL',force => TRUE);

Explain Plan of SQL statement

If the issue is related to a slow down of a particular pre-identified SQL statement then we need to get the SQL plan from the production system to make sure the right access path is taken. To do this the following three options are there.

· If you have already identified the SQL_ID of the SQL then it will be just a matter of querying the V$SQL_PLAN table to get the explain plan.

· The other way is to get the explain plan using the following way on sqlplus SQL> explain plan for < sql statement > ; SQL> SELECT PLAN_TABLE_OUTPUT FROM

TABLE (DBMS_XPLAN.DISPLAY ());

· The other way to get the explain plan is using Tkprof. Please refer to the “Requesting Tkprof” section to know the different kinds of Tkprofs.

To troubleshoot the SQL plan please refer to the section “Troubleshooting Bad SQL Plan”

Troubleshooting Bad SQL Plan

Troubleshooting Bad SQL Plan can be done in following ways

Cross check all base provided Indexes

The following SQL can be executed to give a comprehensive list of Indexes for the tables in question.

SQL> select distinct a.table_OWNER, c.table_name,a.index_owner,a.index_name,c.num_rows Table_Rows ,b.num_rows Index_Rows,b.sample_size,a.table_owner,a.column_name,a.column_position ,b.uniqueness,b.last_analyzed

from dba_ind_columns a, dba_indexes b, dba_tables c

Where A.Index_Name = B.Index_Name

And A.Index_Owner = B.Owner

And A.Table_Owner =B.Table_Owner

AND B.TABLE_OWNER = C.OWNER

And B.Table_Name = C.Table_Name

and a.table_name = b.table_name and a.table_name in (‘< List of Table Names>’)

order by a.table_owner,a.index_name,a.column_position

Full scans

Make sure that there is no full scans in the explain plan

Table access + Index access instead of Index only access

In almost all the cases if the SQL can make the Index only access then that is the best plan but that is subjected to the Index feasibility.

Clustering Factor of the Indexes

Make sure that that the clustering factor of the Indexes is good (which means close to the number of blocks then to the number of rows). Good clustering factor of Indexes means the rows are nicely packed in the blocks and the SQL query result can result from few numbers of blocks which ultimately leads to less CPU and IO overhead.

Indexes with Low Cardinality

Indexes which have low cardinality are likely to cause scalability issues. These Indexes leads to buffer busy waits when large numbers of threads are trying to obtain the lock on the same block. To mitigate this, Indexes with Low Cardinality Indexes should be avoided.

Database Optimizer Parameters for Indexes

There are three optimizer parameters which affect SQL plan in terms of giving preference to Index access in the SQL plan. As our application is mostly Index driven so make sure that these parameters are appropriately set. Please note that these are the recommended starting values but customer can adjust this to values that give maximum performance gains. These parameters are

· optimizer_index_cost_adj - This should be set to 1

· optimizer_index_caching - This should be set to 100.

· optimizer_mode - This should be ALL_ROWS from batch and FIRST_ROWS_10 for online

Requesting Tkprofs

Tkprofs are good source for troubleshooting customer’s issues as they provide information about wait events on SQL level, provide list of all SQLs running during the duration when you are tracing the sessions and also gives a run time explain plan. There are multiple ways to get the Tkprofs

Basic Tkprofs

To enable the Tkprofs set the below before you start the test

SQL> ALTER SYSTEM SET trace_enabled = TRUE ;

SQL> ALTER SYSTEM SET sql_trace = TRUE ;

SQL> ALTER SYSTEM SET timed_statistics=TRUE;

SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 2';

Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.

Once the test to troubleshoot the issue is completed run the following to stop the trace

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';

SQL> ALTER SYSTEM SET sql_trace = FALSE;

SQL> ALTER SYSTEM SET trace_enabled = FALSE;

Concatenate the *ora* files and take the Tkprof as below

tkprof sys=no sort=EXEELA

Tkprofs with Bind Variables

To enable the Tkprofs to get the bind variables set the below before you start the test

SQL> ALTER SYSTEM SET trace_enabled = TRUE ;

SQL> ALTER SYSTEM SET sql_trace = TRUE ;

SQL> ALTER SYSTEM SET timed_statistics=TRUE;

SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 4';

Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.

Once the test to troubleshoot the issue is completed run the following to stop the trace

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';

SQL> ALTER SYSTEM SET sql_trace = FALSE;

SQL> ALTER SYSTEM SET trace_enabled = FALSE;

Concatenate the *ora* files and take the Tkprof as below

tkprof sys=no sort=EXEELA

The concatenated trace file provides the bind variables used in the sessions.

Tkprofs with Wait Events

To enable the Tkprofs to get the wait events set the below before you start the test

SQL> ALTER SYSTEM SET trace_enabled = TRUE ;

SQL> ALTER SYSTEM SET sql_trace = TRUE ;

SQL> ALTER SYSTEM SET timed_statistics=TRUE;

SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 8';

Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.

Once the test to troubleshoot the issue is completed run the following to stop the trace

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';

SQL> ALTER SYSTEM SET sql_trace = FALSE;

SQL> ALTER SYSTEM SET trace_enabled = FALSE;

Concatenate the *ora* files and take the Tkprof as below

tkprof sys=no sort=EXEELA

Tkprofs with Bind Variables and Wait Events

To enable the Tkprofs to get the bind variables and wait events set the below before you start the test

To enable the Tkprofs to get the wait events set the below before you start the test

SQL> ALTER SYSTEM SET trace_enabled = TRUE ;

SQL> ALTER SYSTEM SET sql_trace = TRUE ;

SQL> ALTER SYSTEM SET timed_statistics=TRUE;

SQL> ALTER SYSTEM SET STATISTICS_LEVEL=ALL;

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context forever, level 12';

Once above is set the traces will start coming at the location defined by “user_dump_dest” parameter.

Once the test to troubleshoot the issue is completed run the following to stop the trace

SQL> ALTER SYSTEM SET EVENTS '10046 trace name context off';

SQL> ALTER SYSTEM SET sql_trace = FALSE;

SQL> ALTER SYSTEM SET trace_enabled = FALSE;

Concatenate the *ora* files and take the Tkprof as below

tkprof sys=no sort=EXEELA

The concatenated trace file provides the bind variables used in the session also.

AWR Analysis for Troubleshooting

AWR Report has a comprehensive list of data which is helpful in troubleshooting a database issue. There are few points which I want to highlight.

Top 5 Timed Foreground wait events

The top 5 timed foreground wait events is the first place to start the troubleshoot as it gives a good overview of potential issues.

SQL Statistics

This section of the AWR report gives a very comprehensive list of SQLs ordered by elapsed time, CPU, physical disk reads, consistent gets and executions. This is very helpful to see SQL statistics from these different perspectives.

Instance Activities

This section is helpful to see the Instance activities from per sec and per transaction level.

IO Stats

This section is helpful to see if there are issues related to IO. This helps us to visualize the IO bottleneck from the Tablespace as well as File level.

Advisory Pools

This section is helpful to see if the different pool s are sized properly and also gives an idea if they are undersized.

Wait Stats

This section is helpful to see the wait statistics.

Segment Statistics

This section is helpful to see issues related to the segment level and is one of the most important section for troubleshooting.

Database Initialization Parameters

The following database initialization parameters are good to start with and are recommended by the product. These can be further tuned based on the Implementation.

· processes=3000

· sessions=4500

· sga_max_size= < set according to physical memory available >

· sga_target= < set according to physical memory available >

· pga_aggregate_target=

available >

· memory_target=0 < DO NOT SET THIS >

· memory_max_target=0 < DO NOT SET THIS >

· db_writer_processes=12

· db_block_size=8192

· log_buffer=250M

· log_checkpoint_interval=0

· db_file_multiblock_read_count=8

· dml_locks=4860

· transactions=3000

· undo_retention=900

· sec_case_sensitive_logon=FALSE

· session_cached_cursors=0

· parallel_min_servers=32

· parallel_max_servers=256

· open_cursors=3000

· optimizer_index_cost_adj=1

· optimizer_index_caching=100

Sunday, May 1, 2011

Understanding and tuning Garbage Collection for BankfusionUniversalBanking

Just Garbage ----Garbage and Garbage………….All Garbage……

But you will know why garbage is so important for BFUB application!!!

Clearing of Garbage!!!


Basic understanding of GC

I don’t want to go in details of GC, Memory management and all blah blah. But just to refresh so that understanding of result will become easier, here is quick refresher.

Conceptually, garbage collection (GC) creates the illusion of infinite free space.

– Java has a create (“new”) but no destroy

– Applications create objects as needed on the Heap

In reality, GC reclaims unused memory back to the free lists

– Finds objects that are no longer used

– Makes their storage available for allocation

All garbage collectors follow the same formula

Find all live objects (Mark)

– Trace the object graph from a set of known starting points (e.g., Thread stacks). Known as “The Root Set”

Recycle objects not found onto the free list (Sweep)

– Objects not visible in the live set are “dead”

Optional: Move objects to reduce fragmentation (Compact)

– Free bits of memory here and there create holes

– Cannot allocate object even if total free space is sufficient

– Converts many small holes into fewer large ones

IBM Java GC has a number of selectable policies under which it will recycle objects

Why have many policies? Why not just “the best”?

– Cannot always dynamically determine what tradeoffs the user/application is willing to make

• Pause time vs. Throughput

• Footprint vs. Frequency

This is why we are tuning GC for BFUB application.

Verbose GC logs and its analysis

Location to get verbose GC logs:

Application servers > server1 > Process definition > Java Virtual Machine

Also we can use -verbose:gc option is the main diagnostic that is available for runtime analysis of the Garbage Collector can be put in generic JVM argument.

The native_stderr.log file will be generated on following location:

/ibm/WebSphere7/AppServer/profiles/AppSrv01/logs/server1

Analyzer

For the analysis, we can use following tools from IBM:

· IBM support Assistant - Garbage Collection and Memory Visualizer

· GC Analyzer – ga402

Please don’t try sun jdk in IBM GC analyzer and vice versa. Why? you better know!!

Also, please don’t ask why I am using only IBM tools!!!


Why BankfusionUB needs optimal GC?

· If JAVA is there than garbage will be there. So how BFUB a high throughput application will survive from it.

· Although IBM has given tuned and intelligent GC parameters but still given leeway of the application like BFUB to further optimize it.

· Don’t we want Tier 1 bank!!!!

Always remember Our Aim!!

· Reduce GC pause time overhead as much as possible.

· To improve performance of BFUB and make it scalable.

How GC pause time will lead to improved performance ,we will see in the document!!

Test case and Server specification

We need some BFUB application module to confirm the best GC settings!!!

Test case

Test case

Description

Online

Teller module for 75 users

Batch

Interest Accrual and Interest Accrual posting Batch process

Also, Server specification is important for GC settings!!

Server specification

Server Type

Application

Configuration

App Server

WAS 7

8 Cores, 16 GB RAM,IBM, 9133-55A, Power 5, 1.65 GHz,64 bit

DB Server

DB2 9.7

8 Cores, 16 GB RAM,IBM, 9133-55A, Power 5, 1.65 GHz,64 bit


Description of current GC settings and problems associated with that.

The current setting is default with Initial heap size is 1/4th size of Maximum heap and it is using optthruput GC policy.

Default GC settings

Policy

Option

Description

Optimize for throughput

-Xgcpolicy:optthruput (optional)

(optional)The default policy. It is typically used for applications where raw throughput is more important than short GC pauses. The application is stopped each time that garbage is collected.

It works as given in below figure:

Heap layout before and after garbage collection

Results

Online

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU (%)

Avg. CPU (%)

% improvement in response time

517

1399

19

41

Min512Max2048- optthruput
[baseline]

79.2

62.9

Baseline

Batch

GC time (sec)

Accrual (h:mm:ss)

Posting (h:mm:ss)

collection

Test type

Average GC Overhead %

Max GC Overhead%

85

0:10:33

0:03:21

245

Min512Max2048-optthruput
[baseline]

6

90

Problem Statement 1

Less Initial memory (Keeping Initial memory 1/4th of maximum heap)

Reasons

-Overhead for contraction and expansion of heap will take place during GC which will add to GC pause time.

- Compaction will occur if the heap is too small or fragmented or if the heap is resized and that will add to GC pause time.

-More No. of collection is observed as frequent GC is taking place and overall Average overhead is 19% in online and 6% in batch which is too high.

Solution

Keep Initial memory equal to Maximum memory

Results

Online

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU (%)

Avg. CPU (%)

% improvement in response time

517

1399

19

41

Min512Max2048-optthruput
[baseline]

79.2

62.9

Baseline

283

777

11

69

Min=Max2048 -optthruput

75.9

62.5

6.93849390

Batch

GC time (sec)

Accrual (h:mm:ss)

Posting (h:mm:ss)

collection

Test type

Average GC Overhead %

Max GC Overhead%

85

0:10:33

0:03:21

245

Min512Max2048-optthruput
[baseline]

6

90

29

0:10:22

0:03:10

79

Min=Max2048 -optthruput

3

87

Observations with Change settings

-No contraction/expansion. Thus no time is waste in heap resizing.

-No compact time wasted as contiguous heap is available throughout the test as no resizing is needed.

-Fewer no. of collections were observed and overall Average overhead is reduced to 11% in online and 3% in batch.

- Around 7% improvement in response time.

Although we solve one problem still BFUB is taking 11% overhead in online process. So still it is red hot zone. Now let’s take next important aspect “GC policy” which is nothing but different way of doing GC.

Problem Statement 2

Which is best GC policy for BFUB?

Reasons

· GC overhead is around 19% which is too high so we cannot just rely on optimum throughput (optthruput GC policy-default).

· Need to optimize GC policy which can utilize concurrent GC as well as can distinguish between short lived and long lived objects.

· Performance improvement if any can be possible by reducing GC activities.

Solution

IBM has given so many GC policies to solve our concern. Let’s test one after the other!!!

Optavgpause (should never be used for BFUB)

Policy

Option

Description

Optimize for pause time

-Xgcpolicy:optavgpause

Trades high throughput for shorter GC pauses by performing some of the garbage collection concurrently. The application is paused for shorter periods.

optavgpause is an alternative GC policy designed to keep pauses to a minimum. It does not guarantee a particular pause time, but pauses are shorter than those produced by the default GC policy. The idea is to perform some garbage collection work concurrently while the application is running. This is done in two places:

  • Concurrent mark and sweep: Before the heap is filled up, each mutator helps out and mark s objects (concurrent mark). There is still a stop-the-world GC, but the pause is significantly shorter. After GC, the mutator threads help out and sweep objects (concurrent sweep).
  • Background GC thread: One (or more) low-priority background GC threads perform marking while the application is idle.

Distribution of CPU time between mutators and GC threads in the optavgpause policy

Result

Online

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU (%)

Avg. CPU (%)

% improvement in response time

517

1399

19

41

Min512Max2048-optthruput
[baseline]

79.2

62.9

Baseline

283

777

11

69

Min=Max2048 -optthruput

75.9

62.5

6.9384939

513

513

0

13

Min=Max2048 -optavgpause

94.4

85.7

-593.36253

Batch

GC time (sec)

Accrual (h:mm:ss)

Posting (h:mm:ss)

collection

Test type

Average GC Overhead %

Max GC Overhead%

85

0:10:33

0:03:21

245

Min512Max2048-optthruput
[baseline]

6

90

29

0:10:22

0:03:10

79

Min=Max2048 -optthruput

3

87

11

0:10:11

0:03:18

84

Min=Max2048 -optavgpause

1

75

Observations with Change settings:

As IBM says, there is obvious degradation in the throughput (around 5 – 10%), GC happens concurrently with the application threads (concurrent Mark and sweep), that’s the reason throughput is degrading.

But BFUB application shows more than what IBM says it shows 600% degradation in response time.

But we need to see, why response time is degrading that much .The reason might be CPU utilization is more by concurrent mark and sweep thread and thus fewer resources available for BFUB application.

Thus as of now, permanent Bye to Optavgpause GC policy for BFUB application.

Gencon (The best!!)

Policy

Option

Description

Generational concurrent

-Xgcpolicy:gencon

Handles short-lived objects differently than objects that are long-lived. Applications that have many short-lived objects can see shorter pause times with this policy while still producing good throughput.

A generational garbage collection strategy considers the lifetime of objects and places them in separate areas of the heap. In this way, it tries to overcome the drawbacks of a single heap in applications where most objects die young -- that is, where they do not survive many garbage collections.

With generational GC, objects that tend to survive for a long time are treated differently from short-lived objects. The heap is split into a nursery and a tenured area, as illustrated in Figure 4. Objects are created in the nursery and, if they live long enough, are promoted to the tenured area. Objects are promoted after having survived a certain number of garbage collections. The idea is that most objects are short-lived; by collecting the nursery frequently, these objects can be freed up without paying the cost of collecting the entire heap. The tenured area is garbage collected less often.

New and old area in gencon garbage collection

As you can see in Figure, the nursery is in turn split into two spaces: allocate and survivor. Objects are allocated into the allocate space and, when that fills up, live objects are copied into the survivor space or into the tenured space, depending on their age. The spaces in the nursery then switch use; with allocate becoming survivor and survivor becoming allocate. The space occupied by dead objects can simply be overwritten by new allocations. Nursery collection is called a scavenge; Figure illustrates what happens during this process:

Example of heap layout before and after GC

When the allocate space is full, garbage collection is triggered. Live objects are then traced and copied into the survivor space. This process is really inexpensive if most of the objects are dead. Furthermore, objects that have reached a copy threshold count are promoted into the tenured space. The object is then said to be tenured.

As the name Generational concurrent implies, the gencon policy has a concurrent aspect to it. The tenured space is concurrently marked with an approach similar to the one used in the optavgpause policy, except without concurrent sweep. All allocations pay a small throughput tax during the concurrent phase. With this approach, the pause time incurred from the tenure space collections is kept small.

Figure shows how the execution time maps out when running gencon GC:
Distribution of CPU time between mutators and GC threads in gencon
Distribution of CPU time between mutators and GC threads in gencon

A scavenge is short (shown by the small red boxes). Gray indicates that concurrent tracing starts followed by a collection of the tenured space, some of which happens concurrently. This is called a global collection, and it includes both a scavenge and a tenure space collection. How often a global collection occurs depends on the heap sizes and object lifetimes. The tenured space collection should be relatively quick because most of it has been collected concurrently.

Results

Online

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU (%)

Avg.

CPU (%)

% improvement in response time

517

1399

19

41

Min512Max2048-optthruput
[baseline]

79.2

62.9

Baseline

283

777

11

69

Min=Max2048 -optthruput

75.9

62.5

6.9384939

513

513

0

13

Min=Max2048 -optavgpause

94.4

85.7

-593.3625

155

2206

6

100

Min=Max2048 -gencon

76.2

50.9

39.198494

Batch

GC time (sec)

Accrual (h:mm:ss)

Posting (h:mm:ss)

collection

Test type

Average GC Overhead %

Max GC Overhead%

85

0:10:33

0:03:21

245

Min512Max2048-optthruput
[baseline]

6

90

29

0:10:22

0:03:10

79

Min=Max2048 -optthruput

3

87

11

0:10:11

0:03:18

84

Min=Max2048 -optavgpause

1

75

20

0:10:00

0:03:10

280

Min=Max2048 -GenCon

2

100

Observations with Change settings

-The mean occupancy in the nursery is 3%. This is low, so the gencon policy is probably an optimal policy for this workload.

-Approximately 40% improvement in response time.

-Average GC overhead is reduced to 6% in online and 2% in batch.

-Reduction of around 12% in CPU usage.

Subpool (Ok but…)

Policy

Option

Description

Subpooling

-Xgcpolicy:subpool

Uses an algorithm similar to the default policy's but employs an allocation strategy that is more suitable for multiprocessor machines. We recommend this policy for SMP machines with 16 or more processors. This policy is only available on IBM pSeries® and zSeries® platforms. Applications that need to scale on large machines can benefit from this policy.

The subpool policy can help increase performance on multiprocessor systems. As I mentioned earlier, this policy is available only on IBM pSeries and zSeries machines. The heap layout is the same as that for the optthruput policy, but the structure of the free list is different. Rather than having one free list for the entire heap, there are multiple lists, known as subpools. Each pool has an associated size by which the pools are ordered. An allocation request of a certain size can quickly be satisfied by going to the pool with that size. Atomic (platform-dependent) high-performing instructions are used to pop a free list entry off the list, avoiding serialized access. Figure shows how the free chunks of storage are organized by size:


Subpool free chunks ordered by size
Subpool free chunks ordered by size

When the JVMs start or when a compaction has occurred, the subpools are not used because there are large areas of the heap free. In these situations, each processor gets its own dedicated mini-heap to satisfy requests. When the first garbage collection occurs, the sweep phase starts populating the subpools, and subsequent allocations mainly use subpools.

The subpool policy can reduce the time it takes to allocate objects. Atomic instructions ensure that allocations happen without acquiring a global heap lock. Mini-heaps local to a processor increase efficiency because cache interference is reduced. This has a direct effect on scalability, especially on multiprocessor systems. On platforms where subpool is not available, generational GC can provide similar benefits.

Results

Online

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU (%)

Avg. CPU (%)

% improvement in response time

517

1399

19

41

Min512Max2048-optthruput
[baseline]

79.2

62.9

Baseline

283

777

11

69

Min=Max2048 -optthruput

75.9

62.5

6.938493

513

513

0

13

Min=Max2048 -optavgpause

94.4

85.7

-593.3625

155

2206

6

100

Min=Max2048 -GenCon

76.2

50.9

39.19849

245

638

7

89

Min=Max2048 -subpool

70.9

56.3

22.92960

Batch

GC time (sec)

Accrual (h:mm:ss)

Posting (h:mm:ss)

collection

Test type

Average GC Overhead %

Max GC Overhead%

85

0:10:33

0:03:21

245

Min512Max2048-optthruput
[baseline]

6

90

29

0:10:22

0:03:10

79

Min=Max2048 -optthruput

3

87

11

0:10:11

0:03:18

84

Min=Max2048 -optavgpause

1

75

20

0:10:00

0:03:10

280

Min=Max2048 -GenCon

2

100

24

0:09:36

0:03:10

66

Min=Max2048 -subpool

3

94

Limitation for Subpool

- The subpool policy can help increase performance on multiprocessor systems. As I mentioned earlier, this policy is available only on IBM pSeries and zSeries machines.

- Overhead is more =7% Compaction is happening so increasing AF/GC pause time during the test

- On an Average GC Pause time is more as more amount of memory need to reclaim in Subpool (375ms) than in gencon (120ms). [Look at GC pause time in the graph below].

Now gencon is the best GC policy for BFUB application has been proved by us.

Now What!!!

Problem Statement 3

Tuning gencon policy

Reason

As discussed earlier nursery is nothing but where short lived object are stored and thus moved to tenure if it survive certain no. of GC. During our analysis we observed that there is shorter lived object rather than long lived object in BFUB and thus playing with nursery size will land up us in some positives.

Solution

Tuning nursery size (Xmn)

-default (25% of Max heap size) and remaining (75% of Max heap size) with tenure.

-50% of Max heap size and remaining (50% of Max heap size) with tenure.

-75% of Max heap size and remaining (25% of Max heap size) with tenure.

Results

Online

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU (%)

Avg.CPU (%)

% improvement in response time

155

2206

6

100

Min=Max2048 –gencon (Default (25%) nursery size)

76.2

50.9

39.19849

107

1082

3

100

Min=Max2048 -gencon(50% nursery size)

65.6

50.6

46.18440

92

724

2

100

Min=Max2048 -gencon(75% nursery size)

64.6

49.9

25.32519

Observations with Change settings

- There is continuous Nursery heap occupancy was observed in 25% nursery size settings.

- On an Average GC Pause time is more as more amount of memory need to reclaim in 75% nursery size settings (137ms) than in 50% nursery size settings (120ms).

- The mean occupancy in the tenured area is 87% in 75% nursery size setting which is high.

- The Best setting (46% improvement in response time) was observed for 50% nursery size settings.

- Fewer no. of global garbage collection was observed in 50% nursery size (4) than 75% nursery size (9).

Problem Statement 4

Remove explicit GC

Reasons

The use of System.gc () is generally not recommended since they can cause long pauses and do not allow the garbage collection algorithms to optimize themselves.

Solution

-Xdisableexplicitgc. So we will not give any chance for BFUB code for occurrences of System.gc ().

Thus, we saw there was great improvement in CPU Utilization and Response time of transactions in online mode and significant reduction in GC pause time. On the other hand, elapsed time of batch process is more or less same (as GC overhead was less) but optimized GC settings reduced GC pause time.

Problem Statement 5

Why not test with full M&D (Teller with ATM, BPW, Lending, collateral and core) and see the impact?

Reasons

We should know how the impact when all the modules are running which is the real time scenario in banks.

Solution

GC time (sec)

Collection

Average GC Overhead %

Max GC Overhead %

Test type

Max CPU %

Avg. CPU %

% improvement in response time

570

1676

23

95

Min512Max2048-optthroughput
[baseline]

85.6

67.2

Base

162

1431

3

100

Min=Max2048 -GenCon/Xmn1024

71

53.3

46

Problem Statement 6

Why not run full EOD and see the impact?

It is good to have expectation that we will able to improve overall EOD timings by this.


Still Grey areas in Bankfusion UB

Finalizer

Using finalizers is not recommended as it can slow garbage collection and cause wasted space in the heap. We have to Consider review BFUB application for occurrences of the finalize () method. We can use the ISA Tool Add-on, IBM Monitoring and Diagnostic Tools for Java - Memory Analyzer to list objects that are only retained through finalizers.

LOA

LOA is large object allocation. A large object is the object which occupies more than 64k in Heap. More the Large object more the GC pause time and thus minimization or prohibition of large object should take place.

There are no dumb questions in BankfusionUB

Q1. How GC settings changes with system and transaction?

From above discussion, it is cleared that Product has to decide whether it is needed response time improvement or throughput improvement or trade off. Thus transaction requirement is very much important wrt GC settings. Also, System also plays important role as GC is CPU intensive activity and each GC thread depend on the configuration. Thus GC settings should be set according to system configurations.

Q2. Why not give maximum heap if available with system?

No, this is the myth that we can give maximum amount of heap. Always GC settings should be recommended by iteratively testing and comparing the result. The Problem with large heap is it may need to clear big pile of heap and which will pause the system for longer duration and also as heap is fragmented GC thread may required longer time to mark the objects.

Q3. Give some magic formula to give to customer so that BFUB can implement in customer environment?

No magic formula but magic analysis can be done according system requirement and transaction Peak with some of our magic tools


Conclusion/Best Practices

Hope GC settings explained in the document will take care of everything!!

But still GC settings are dependent on system and transaction volume.

Thus, Final GC settings:

Initial heap size = Maximum heap size

GC Policy: gencon (-Xgcpolicy:gencon)

Nursery size = 50% of Maximum heap size (-Xmn1024m)

Disable explicit GC= -Xdisableexplicitgc

To print the GC parameters = -verbose: sizes

Maximum heap size will depend on the system configuration and throughput requirement.

Sources

http://www.iecc.com/gclist/GC-faq.html

http://www.ibm.com/developerworks/ibm/library/i-gctroub/

http://www.ibm.com/developerworks/java/library/j-ibmjava2/

http://www.ibm.com/developerworks/java/library/j-ibmjava3/

http://www.performancewiki.com/was-tuning.html

http://publib.boulder.ibm.com/infocenter/ieduasst/v1r1m0/index.jsp?topic=/com.ibm.iea.was_v7/was/7.0/ProblemDetermination/WASv7_GCMVOverview/player.html

http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.60/diag/appendixes/cmdline/cmdline_gc.html