Db2 for z/OS Index Performance Recommendations

Many considerations can be made regarding design choices when building and maintaining Db2 for z/OS indexes. This article is going to attempt to address many of those choices, but it is the combination of knowing how an application is going to use the database as well as adequate testing to make the appropriate design decisions.

Index Design

When designing and building a database, it is important to consider how the tables in the database will be accessed, and at what rate. If generic database design rules are followed then adjustments to the physical design of the majority of the tables and indexes in the database can be ignored, with focus on only the anticipated high volume objects. When creating indexes on high volume database tables, especially tables that will be accepting large numbers of insert, deletes, and perhaps even updates, it is important to keep the number of indexes to a minimum when performance is a concern. Preferably only one index per table. This index would then have to satisfy multiple functions:

  • Primary key
  • Clustering key
  • Foreign key support
  • Matching the range-partitioning of the associated table, if range-partitioned
  • Access path

If a table is read-only then there is no need to worry about too many indexes as no number of indexes will create a negative performance situation. However, most tables are not read-only. A common rule of thumb might be 80% reads and 20% data changes, however in this day of mass ingest those percentages could be highly inaccurate. It is important to note that indexes are not updated, but an update to an index is a delete and insert of the index entry. It is also very important to know the data change rate for tables where performance is a concern, as well as the common access paths into those tables. Then a decision can be made about the number of indexes on the table. However, a very good rule of thumb is to start with a design that utilizes only one index per table! The goal is to have this index serve all purposes, whether it be clustering, partitioning, access path, etc.

Impact of Clustering

While clustering is something that benefits table access, it is also important to think about the index that is used to cluster a table and how that index is accessed. Db2 can achieve incredibly high read rates in support of batch processing when a sequential reader (usually the driving cursor for a process) is reading from a table in the order of the clustering key and the table is reasonably well organized. Depending upon insert and update activity and available free space in the table space, REORGs may be required at certain intervals in order to maintain this high efficiency. For both sequential readers and random readers, accessing via a clustering index can be beneficial for performance. A common clustering between two related tables can be beneficial for random readers as well as most sequential readers. A good example is a table relationship between a parent table and child table that is identifying (child table primary key starts with leading columns from parent table primary key). When traversing the index of the inner table of an SQL join via the primary clustering key index, key values for multiple rows will be co-located. This will minimize the number of random read operations on behalf of the join.

Db2 is able to take advantage of performance enhancers, in the form of sequential detection and index lookaside, when it detects sequential access. Certain things, such as the bind parameter RELEASE(DEALLOCATE), including for high performance DBATS, can assist with the efficiency of these built-in performance enhancers. Reading multiple rows (or keys) that are located close together is better than reading multiple rows (or keys) that are located far apart. Therefore, if there is a common access pattern for an application then supporting that pattern with clustering can result in significant performance gains.

Keys and Relationships

Using identifying relationships when designing a database is the key to the one index solution. In relational theory, a relationship between two entities may be classified as being either identifying or non-identifying. Identifying relationships exist when the primary key of the parent entity is included in the primary key of the child entity as the initial columns of that child key. On the other hand, a non-identifying relationship exists when the primary key of the parent entity is included in the child entity but not as part of the child entity’s primary key. A significant number of modern day developers will object to using identifying relationships, as they prefer a single column system-generated unique key for each table. Therefore, not every table needs to have a 1-index design, but perhaps a compromise can be made with developers so that only the tables with the highest volume of data changes get the 1-index design.

UUIDs (UUID, GUID, or UID) are standard practice for many modern designs that incorporate an object-relational data stack. A standardized UUID is randomized based upon an internal time clock, but in practice it could be generated by any number of different algorithms. When UUIDs are introduced into a database design, they can influence the decisions that are made relative to indexing, clustering and free space! Therefore, it is important to understand how these IDs are generated for a particular database design, and make appropriate accommodations. Understanding whether or not the pattern of assignment of new UUIDs is truly random or sequential will affect the free space settings for an index. If it is random then index free space should be allocated, and if sequential then free space is useless. Perhaps it would be fruitful to address the need for natural keys for high volume tables, or possibly compound UUID keys for critical parent-child relationships. It is possible to negotiate the natural-key “in” (meaning the natural access point to a primary entity) for a database design. Allow for clustered access based upon that natural key, and then cluster the rest of the tables by the UUID of that parent table, whether those keys be identifying or non-identifying. Remember this is something that would be implemented for high performance. If high performance is not an expected feature of the application then there is no need to fight the design team over primary keys.

Clustering can be a challenge with a variety of generated unique IDs as keys. If Db2 Generated values are used, whether they are GENERATE_UNIQUE_BINARY() values or ascending sequence values as primary keys, then maintaining cluster based upon natural keys is impossible without secondary indexes. While the table may benefit from clustering, the indexes are another separate issue. It really depends upon the access patterns of the application queries, and which indexes they end up using.

If sequences (identity columns or sequence objects) are used for generating UUIDs, then a common problem is the use of these sequence values without the appropriate caching set. The default is cache size is 20 per Db2 data sharing member, but if there is a high usage rate then that may be too low, especially in a data sharing environment. I have used a cache value as high as 1000 for high volume applications. Setting ORDER and NO CACHE for sequence values is extremely detrimental to performance! The reason for these setting for sequences are specifically for application requirements for strict ordering of values, even across a data sharing group. The reason this is detrimental for performance is that there is absolutely no caching of values, Db2 has to go to the system catalog for each value and this can result in serious contention for high volume transactions.

If management tells you “we want the application to run as fast as possible” then your response should be “well great, buy an infinite number of machines”. Ask for a reasonable expectation of performance, and if the use of UUIDs will not impede that expectation then relax, all is well. If there is fear of a performance bottleneck then special considerations should be made about the use of UUIDS, indexing, clustering, and free space!

Free Space Considerations

When an index page fills with entries, it must be split. When Db2 splits an index page it needs to put the new page somewhere, and it needs a completely empty available page to do it. Another thing that happens during an index page split is that Db2 will serialize on the root page when the split occurs. Therefore, index page splits are a serial process, meaning one agent at a time can split a page. For highly parallelized processes where you have multiple agents acting on a single index or index partition then there could be concurrency issues. Since indexes are maintained in key sequence then free space should be allocated such that index page splits are avoided between index REORGs. If splits cannot be avoided then a combination of PCTFREE and FREEPAGE can be set to try to reduce the overhead of splits so that if a split happens Db2 can find a page nearby to use for the new page. This is most important if the index key is a natural key.

If the index key is a system-generated ascending or descending value, or if the key is non system-generated but still ascending or descending in nature (that is, new keys are assigned in a sequence), then it is perhaps best to allocate no free space and let Db2 simply tack new entries to the end of the index. This is where it really pays off to know how your UUIDs are generated. In this situation, if the index keys are updated then some free space might be helpful.

Indexes should be reorganized more frequently than table spaces.

Page Size and Compression

When an index allocated with 4K pages becomes quite large, at approximately 200+ gigabytes in size, it can become a bit of a performance impact. This is because there could be significant random access to the index, and that could lead to a large volume of synchronous and very inefficient I/O’s. These indexes are difficult to tune because as free space is increased in the form of PCTFREE and FREEPAGE it increases the size of the index, and that can exacerbate the situation. This may be the perfect situation for a larger index page size! Actually, a larger index page size may benefit many indexes, and not just the very large ones.

Index compression can be used to save physical disk storage and requires use of a larger index page size. For a very thorough conversation about index compression please read this article on the IDUG web site by Brian Laube. An IDUG login is required, but it is free! One final note on index compression is that DSN1COMP should definitely be used to predict the proper compression rate and page size, as improper use of index compression could adversely impact performance, memory or disk usage.

The true advantage to a larger index page size is something called index fan-out. Db2 utilizes a b-tree index structure, with a root node that points to either leaf pages or intermediate non-leaf pages that point to other non-leaf pages or leaf pages. Index fan-out is defined as the number of children a non-leaf node in a b-tree can point to. Therefore, by having a larger page size you can get a greater level of index fan-out. This results in the index tree being “bushier”, or fatter rather than taller. This has sort of a double effect, or maybe a quadruple effect with compression (disk storage only as compressed index pages are expanded in the buffer pool), of saving buffer memory for the indexes. For example, if an index contains 137,160 active pages and 136,120 leaf pages, that means that the index has about 1,040 non-leaf pages and a fan-out of 131 (136120/(137160-136120)). If we increase the page size to 8K we will cut the number of leaf pages in half, but also approximately increase the fan-out by a factor of two because the non-leaf pages are also larger. To reiterate, larger non-leaf pages point to more leaf pages! So, if I just use simple math 136,120/2 is 68,060 leaf pages with an 8K page size. Now those non-leaf pages will be reduced by a factor of 4 to about 260, with a fan-out increased to 261. So, the 4K index is 137,160 pages or about 560MB with 4.2MB required for non-leaf pages. The 8K index is still about 560MB, but with 2MB required for non-leaf pages. That means that there is greater opportunity for more non-leaf pages to remain in buffer pool memory, resulting in a potential performance improvement. This potential savings is two fold. First, there could be fewer I/O’s because there are less pages to read in the larger page size index. Second, because of the improved fan-out the number of non-leaf pages can be reduced and thus require less actual memory inside the buffer pool, with 8K non-leaf pages requiring half the amount of memory of the 4K non-leaf pages due to the improved fan-out. This assumes that, in general, more non-leaf pages will be retained in a buffer pool than leaf pages (depending upon access patterns) assuming a least recently used page steal algorithm. Additionally, there could be fewer index levels to traverse. For large indexes the savings could be really dramatic. This example was actually for 1 partition of a 200 partitioned index. So, the potential buffer pool memory savings is actually 440MB for just this one index in a database with 130 tables. At that level, the memory and I/O savings really starts to add up!

Monitoring and Tuning

In a previous article I highlighted the proper monitoring for REORGs of table spaces and indexes using the real-time statistics catalog tables. Beyond information in the real-time statistics tables, with indexes I would recommend setting up some things that would make monitoring them slightly easier. For example, placing indexes in a separate buffer pool from table spaces is one method of making a dramatic difference in monitoring. Separating an extremely high volume index into its own buffer pool is another good idea. Then, when producing an accounting report for a particular application it is easy to determine if an I/O issue is related to table spaces or indexes just by looking at the buffer pool information. I highlight the importance of accounting traces and reports in this article.

Summary

Be careful in creating additional indexes on a table, especially for tables with a high insert or delete rate, and for tables in which the indexes keys would be heavily updated. For partitioned tables, non-partitioned secondary indexes and data-partitioned secondary indexes can provide further challenges in the form of large size and/or access path challenges. There are many things to consider and the biggest benefit will come from having an understanding of the application behavior. Remember, it is easy to create an index, and very challenging to drop an index.

Leave Comment

Your email address will not be published. Required fields are marked *