Db2 SQL and Application Performance: Part 6, Multi-Row Operations

This is the Sixth of several blog articles addressing Db2 performance. All the content of these articles will be based upon Db2 for z/OS. However, much of it can also be applied to Db2 for LUW. The articles will address mostly application performance regarding coding efficient SQL statements as well as efficient Db2 application coding techniques and settings. There is a saying in the industry, “it depends”, which is just another waying of saying “I don’t know”. While these articles will avoid using either of these terms, please keep in mind that your result can and will be different. With that in mind any design decision should be thoroughly tested prior to implementation. For testing references please see

https://www.db2expert.com/db2expert/testing-db2-design-decisions/

Call Only When Necessary

Yes, indeed the most efficient SQL statement is the one that is never executed. There are two things to remember about Db2. First, it’s a black box of sorts. That is, under the covers it is managing data in a vast number of physical datasets, maintaining the associated metadata that obscures the actual physical location and organization of the data. Thus, relieving the application developer of any knowledge of the physicality of the data they are accessing. All the developer needs to know is the tables and columns they need to access. That comes with an associated cost such that a database call will never be as cheap as a native file system call. Second, any call to Db2 from an application program (all calls to Db2 are from a program) is an external call from that application program. For example, if the call is in a CICS program or COBOL batch program to Db2 for z/OS located in the same z/OS LPAR it is considered a cross-memory call from one address space to another. If the call is coming from a remote application on another server, then that call is not only external from the Db2 distributed address space, but it involves traversing a network as well.

So, how can calls to the Db2 be reduced? There are a lot of things that can be done in this regard, and this is the single most important thing that an application developer can do to improve performance. You see, when a DBA sees an application with excessive “in Db2” times, they can investigate individual SQL statements that the application executes. If they see a few SQL statements using a lot of resources it can be an easy fix without application changes, such as adding an index or running statistics. However, if what they see is a whole lot of inexpensive statements executing then there is little a DBA can do to fix the issue. In this case it’s all on the application developer to reduce the number of calls.

Please refer to these articles for specific recommendations:

In this article I’m going to discuss multi-row operations!

Multi-Row Opportunities

Db2, besides set operations (e.g., aggregations), provides for certain multi-row operations via either mass INSERT, UPDATE, and DELETE, or via specific multi-row statements. The multi-rows statements include INSERT, FETCH, and MERGE, although the usability of these statements varies between the different flavors of Db2 (z/OS, LUW, iSeries). Utilizing multi-row operations can seriously improve elapsed time for database applications.

Multi-Row Fetch

For remote applications accessing a Db2 data server over a network, a multi-row fetch operation is almost a given, dependent upon network and Db2 package settings (check your settings and the manuals). Most importantly, block fetch must be enabled. Basically, what happens is when a remote cursor is opened by the Db2 client it will request multiple rows per fetch depending upon various factors. The data server will pull the rows from the cursor and return them in a single message back to the Db2 client. The client software will then deliver the rows of the cursor one at a time to the application that requested the data, thus avoiding multiple calls to the data server, and consolidating multiple messages over the network to a single message.

On Db2 for z/OS local applications can utilize multi-row fetch via the syntax of the DECLARE CURSOR and FETCH statements. It is important to note that even though we are talking about a local application, that application still must make cross-memory calls to fetch data from the Db2 data server. This means that for every row in a cursor a local application is retrieving, a cross-memory call is made. Therefore, utilizing multi-row FETCH can be extremely valuable for saving CPU and elapsed time. This feature is available for embedded SQL applications and developers will have to code for it. This involves a rather simple process of setting up variable arrays as opposed to straight variables, and then adding a couple of key clauses to the DECLARE CURSOR and FETCH statements.

Here is an example of an array variable as defined in a COBOL program.

Here are the DECLARE and FETCH examples.

In regards as to the best number of rows to return in a FETCH operation I have found that somewhere between 50 and 100 rows there’s a diminishing performance improvement so you can either test and determine what is best for your situation or just assume 100 rows.

The hardest part of coding is handling the programmatic loops when dealing with the arrays being sent back, especially the last fetch where the program will receive a SQLSTATE 02000 (SQLCODE +100) to indicate the end of the cursor, but there still can be a partial result remaining in the variable arrays. In this case the SQLERRD(3) field of the SQLCA will contain the number of rows returned.

Multi-Row Update and Delete

Multi-row updates and deletes are also possible, which are technically called mass updates or mass deletes. These are DML statements which specify search criteria that do not match a single row, but instead qualify multiple rows or perhaps an entire table. For example, if we look at our Db2 sample database and decide to fire all the employees that have not put in time working we could issue the following statement as opposed to querying the table and removing them one by one.

Now, if the information about who was fired is desired, we can always embed the DELETE statement with a SELECT (see previous blog article) to get that information and still get everything accomplished in a single SQL call.

UPDATEs and DELETEs can be quite complex, including predicates, complex SET expressions, INCLUDE columns, subqueries, etc. So, they really can be treated as their own mini programs. This enables the developer to push the data intensive code down to the database level to achieve maximum performance!

This update statement employs a complex set of business rules to find employees from department C01 that work on projects for that department and give them a salary and bonus equal to the highest available companywide! To do this, a program would require a loop and several calls to the data server. Why not do it in a single call?

MERGE Statement

A MERGE statement merges one table into another, and its usability varies between Db2 for z/OS and Db2 for LUW. On Db2 for LUW it’s a merge of a table expression into a table, and on Db2 for z/OS it’s a merge between a table and a table, the first being either a table expression or a table built on the fly from variable arrays in a program. One can think of a MERGE as sort of an advanced UPSERT in that any rows from the source table that are not found in the target table with be inserted into the target table. Likewise, any rows that are matched between the source table and target table are updated. DELETEs can also be introduced for matched or unmatched rows. So, MERGE can be very useful in data replication situations, or in message processing applications. This is a Db2 for z/OS embedded SQL example.

Here is a Db2 for LUW example.

Once again, MERGE can contain INCLUDE columns and it is possible to SELECT from a MERGE!

It is important to note that while MERGE is an effective replacement for multi-row operations it has been found to not be a real benefit in the case of INSERT/UPDATE logic. That is, many applications attempt an INSERT of a row into a table. If the INSERT fails with a SQLCODE -803 (duplicate key), then an UPDATE is performed. Replacing this logic with a MERGE makes sense but in practice it is not paying off performance wise. So, for new applications it makes sense but there is no need to rewrite older applications to use MERGE in this situation.

GET DIAGNOSTICS

The GET DIAGNOSTICS statement is extremely important, especially for analyzing the results of a multi-row INSERT or MERGE statement. A GET DIAGNOSTICS statement provides diagnostic information concerning the previous SQL statement issued. For multi-row operations it will include information about which individual rows in the operation resulted in an error and what specifically the error was. This is extremely useful for non-atomic executions of MERGE and multi-row INSERT where a single row can fail but the rest of the statement, meaning the other rows, is successful.

Array Variables

While not specifically a multi-row feature, array variables can be very useful for multi-row operations. An array is an ordered set of elements of a single built-in data type. A user-defined array data type can be built as an array over a built-in data type.

There is quite a collection of array related functions provided for SQL processing. Most exciting is the ability to send the results of a SELECT into an array, and to convert one or more arrays into a table utilizing an UNEST table expression.

These two operations alone open the doors to new possibilities for statement reduction. One area of my recent exploration into this technology involved remote batch operations. A remote process was interested in expanding processing from operating real-time for a single input key to batch processing using multiple input keys. There already existed a stored procedure for the online process. Upon investigation it was determined that the stored procedure could be modified to accept an array of keys as input. So, instead of the batch process calling the stored procedure thousands of times it would instead batch up the keys into an array, send the array to the stored procedure, and then process the results sets and parameters. A nice way to turn thousands of calls into a handful of calls and save hours of batch processing time!

Here is an example of what a remote Java calling program might look like when passing an array.

Coming in the Next Blog

The next blog will feature tips and suggestions for modern service-oriented applications and micro-service applications, as well as tips regarding keys and clustering.

Leave Comment

Your email address will not be published. Required fields are marked *