Collecting statistics (Data Virtualization)
To optimize query performance, make sure that you are collecting statistics on the data that is being queried.
The decisions that are made by the cost-based optimizer are crucial to determine the performance of queries. The optimizer makes its decisions based on statistical information that it has about the data that is being queried. Accurate and up-to-date statistics on the data are required to ensure optimal query performance. Statistics should be collected whenever the following conditions apply:
- A new table is created and populated with data.
- An existing table’s data undergoes significant changes such as the following conditions:
- New data is added.
- Old data is removed.
- Existing data is updated.
For the optimizer to have enough information to build efficient execution plans, statistics should be gathered for:
- Any table that is referenced in a query.
- All columns that are referenced in predicates (including join predicates) and aggregation functions that are referenced in queries.
- All columns that are referenced in informational constraints.
- Column group statistics for columns with implied relationships.
You don't need to gather statistics for columns that appear only in the first SELECT list of a query.
To gather statistics on virtualized objects, you can use the SYSPROC.NNSTAT
procedure. The statistics are saved in the system catalog. For more information, see the SYSPROC.NNSTAT stored procedure.
- Virtualized Tables
- When a table is virtualized and the remote data source supports statistics collection locally, the statistics at the remote data source are used to update the local statistics.
- Virtualized Files
- When a file is virtualized, it is important to gather statistics on the virtualized object for
improving the performance of queries that access that data. The
SYSPROC.NNSTATprocedure should be called to collect the key statistics for this new table by using the data-based method. If the file undergoes a significant change, theSYSPROC.NNSTATprocedure should be called again to collect the new statistics. - Views
- When you create a virtual view, it is important to make sure that statistics are collected for all the tables that are referenced in the view, as described in the previous sections.
Checking for missing statistics
To determine whether table level statistics are missing for a table, you can use the following query:
select cast(TABNAME as varchar(25)), TYPE, COLCOUNT, STATS_TIME from SYSCAT.TABLES
where TABSCHEMA='<schema>'
and CARD = -1;
Replace <schema>with the schema name of the virtual table. The command
lists all the objects in the schema that have a CARD of -1, which is an indication
that table level statistics have not been collected.
A similar query can be used to check for missing column statistics:
select cast (COLNAME as varchar(25)) from SYSSTAT.COLUMNS
where TABSCHEMA = '<schema>'
and TABNAME = '<table_name>'
and COLCARD = -1;
Replace <schema>with the schema name and <table_name>
with the name of the virtual table. The command lists all the objects in the schema that have a
COLCARD of -1, which is an indication that column level statistics have not been
collected.
Collecting missing statistics
The following example demonstrates a call to gather statistics with the data-based
SYSPROC.NNSTAT procedure.
SYSPROC.NNSTAT on a virtualized table. The catalog-based method deletes any
previously collected statistics, which has an adverse impact on query performance.call SYSPROC.NNSTAT(NULL,
'<schema>',
'<table_name>',
'<list_of_columns_commonly_used_in_predicates>',
'',
2,
'/tmp/<table_name>.log',
?,
1);
This call collects table level statistics and column level statistics for the specified columns
in <list_of_columns_commonly_used_in_predicates>. If the table has a small
number of columns, you can collect statistics for all columns in the table by passing
NULL for the column list argument. If you don't want to collect column statistics,
specify the empty string, '', as the fourth parameter.
For more information, see the SYSPROC.NNSTAT stored procedure.