Db2 SQL and Application Performance: Part 5, Using Data Change Table References to Reduce Calls to the Data Server

This is the fifth of several blog articles addressing Db2 performance. All the content of these articles will be based upon Db2 for z/OS. However, much of it can also be applied to Db2 for LUW. The articles will address mostly application performance regarding coding efficient SQL statements as well as efficient Db2 application coding techniques and settings. There is a saying in the industry, “it depends”, which is just another waying of saying “I don’t know”. While these articles will avoid using either of these terms, please keep in mind that your result can and will be different. With that in mind any design decision should be thoroughly tested prior to implementation. For testing references please see

https://www.db2expert.com/db2expert/testing-db2-design-decisions/

Call Only When Necessary

Yes, indeed the most efficient SQL statement is the one that is never executed. There are two things to remember about Db2. First, it’s a black box of sorts. That is, under the covers it is managing data in a vast number of physical datasets, maintaining the associated metadata that obscures the actual physical location and organization of the data. Thus, relieving the application developer of any knowledge of the physicality of the data they are accessing. All the developer needs to know is the tables and columns they need to access. That comes with an associated cost such that a database call will never be as cheap as a native file system call. Second, any call to Db2 from an application program (all calls to Db2 are from a program) is an external call from that application program. For example, if the call is in a CICS program or COBOL batch program to Db2 for z/OS located in the same z/OS LPAR it is considered a cross-memory call from one address space to another. If the call is coming from a remote application on another server, then that call is not only external from the Db2 distributed address space, but it involves traversing a network as well.

So, how can calls to the Db2 be reduced? There are a lot of things that can be done in this regard, and this is the single most important thing that an application developer can do to improve performance. You see, when a DBA sees an application with excessive “in Db2” times, they can investigate individual SQL statements that the application executes. If they see a few SQL statements using a lot of resources it can be an easy fix without application changes, such as adding an index or running statistics. However, if what they see is a whole lot of inexpensive statements executing then there is little a DBA can do to fix the issue. In this case it’s all on the application developer to reduce the number of calls.

Please see article 1 for call avoidance recommendations.

Please see article 2 for using SQL joins to avoid calls to the data server.

Please see article 3 for using subqueries to avoid calls to the data server as well as performance choices for subqueries and joins.

Please see article 4 for using stored procedures to reduce calls to the data server.

In this article I’m going to present using data change table references as a way to avoid excessive calls to the data server!

What is a Data Change Table?

A table reference in a SQL statement is more than just a portion of the statement that points to a Db2 table. According to the SQL reference manual, a table-reference specifies a result table as either a table or view, or an intermediate table. So, I am assuming that everyone who accesses Db2 data is familiar with tables and views. What some people may not realize is that the result of a SQL statement is in fact a table. Thus, the reference to “result table” in the definition of a table reference. So, if a SQL statement produces a result table, then that table can be made available for further SQL processing. Thus, SQL produces output that can be used as input to SQL!

The approximate limit to the number of table references in a SQL statement is 1,024 so there can be a lot of flexibility, as well as complexity! As of the release of this article here is the complete list of table references:

  • single-table-reference
  • single-view-reference
  • nested-table-expression
  • data-change-table-reference
  • table-function-reference
  • table-locator-reference
  • xmltable-expression
  • collection-derived-table
  • joined-table

A data-change-table-reference is defined as an intermediate result table that is the set of rows that are directly affected by the data change statement. In other words, the result table in this case represents the rows that are directly changed by a SQL data change statement. There are some serious limitations to a data-change-table-reference, the primary one being that it can only be specified in the FROM clause of the outer fullselect that is part of a SELECT statement. It is more complicated than that, but for now we’ll focus on the fact that only the outer most SELECT can be from a data-change-table-reference. However, what this means is that the affected rows from an INSERT, UPDATE, DELETE, or MERGE statement can be read via a SELECT. From a performance perspective, any time you can do multiple things in a single message to the data server is an advantage, especially across a network!

Nested Table Expressions

In blog article 3 of this series I hinted at the concept of nested table expressions, but never really explained them. Basically, if we know that the result of a SQL statement is a table, then that table can be read by further SQL, as long as it’s part of the same statement. Let’s look at a simple example from the Db2 sample database. Let’s suppose we wanted to know the highest and lowest of the total salaries, by department, in our database. One way to solve it would be to get the total salaries for each department and sort through it until the highest and lowest total salaries were determined.

SELECT EMP.WORKDEPT, SUM(EMP.SALARY)
FROM EMP EMP
GROUP BY EMP.WORKDEPT

However, since the result of this query is a table then it can be embedded in an outer query as an intermediate table, commonly known as a nested table expression. So, the business requirement in this case can be met by a single SQL statement.

SELECT MAX(TOTSALARY), MIN(TOTSALARY)
FROM TABLE (
SELECT EMP.WORKDEPT, SUM(EMP.SALARY)
FROM EMP EMP
GROUP BY EMP.WORKDEPT
)
AS TOTTBL(WORKDEPT, TOTSALARY)

Notice the “TABLE” after the FROM clause and the “AS” towards the end of the query. TABLE identifies what follows is an intermediate table and although it is not always required, it’s a good idea to include it in the statement. What follows the AS is called a table specification and, in my opinion, is very important. This gives the intermediate table a name and column names, which then can be used in outer portions of the statement. Why is it important? In some situations, if an ambiguous reference is made to a column Db2 may guess which column it is referring to, which results in a SQLCODE +15, which is simply a warning. The statement completes but may not produce the correct results! Say what?

Specification of a Data Change Intermediate Result Table

In a similar fashion to a nested table expression an intermediate table can also include a data-change-table-reference. This can be the inserted rows from an INSERT statement, the deleted rows from a DELETE statement, the changed rows from an UPDATE statement, or any of the above from a MERGE STATEMENT.

  • FINAL TABLE specifies that the rows of the intermediate table represent the set of rows that were changed as they appear after completion of the data change statement.
  • OLD TABLE specifies that the rows of the intermediate table represent the set of affected rows as they existed prior to the application of the data change statement.

It is also possible to introduce generated data within the intermediate table by way of INCLUDE columns. In no way to be confused with include columns of an index, these are instead generated columns that are included in the intermediate result table of a data-change-table-reference. Confused? Now is a good time for examples!

SELECT from INSERT

A SELECT from INSERT returns newly inserted rows. This is especially useful if you are taking advantage of generated values, such as column defaults, identity or sequence values, or values generated by triggers. Suppose this table exists.

CREATE TABLE EMP2 (
EMPNO INT NOT NULL 
   GENERATED ALWAYS AS IDENTITY
   (START WITH 1
    INCREMENT BY 1), 
FIRSTNME VARCHAR(12) NOT NULL,  
LASTNAME VARCHAR(15) NOT NULL);

The application that adds employee data will also be adding related data and so it needs the EMPNO column to populate data into those child tables. This is traditionally handled by first acquiring the EMPNO in a separate statement referencing a table or sequence object and then performing the insert, or perhaps doing the insert and then issuing a separate statement to retrieve the identity value. In either case will result in two trips to the data server. If SELECT from INSERT is utilized, then it results in a single trip to the data server.

SELECT EMPNO, FIRSTNME, LASTNAME
FROM FINAL TABLE (
INSERT INTO EMP2 (FIRSTNME, LASTNAME)
SELECT FIRSTNME, LASTNAME
FROM   EMP
WHERE  WORKDEPT = 'C01')

The generated EMPNO column is returned from the INSERT and two requests are handled in a single statement.

SELECT from UPDATE with INCLUDE Columns

A SELECT from UPDATE can return the rows that were changed either before the update (via OLD TABLE) or after the update (via FINAL TABLE). In this example we want to give all the employees in department “C01” of the Db2 sample database a 5% raise, and we want to return their previous and new salaries. We can accomplish all of this with a single statement using FINAL TABLE and an INCLUDE column!

SELECT EMPNO, OLD_SALARY, SALARY
FROM FINAL TABLE (
UPDATE EMP 
INCLUDE (OLD_SALARY DEC(9,2))
SET SALARY = SALARY * 1.05
   ,OLD_SALARY = SALARY
WHERE WORKDEPT = 'C01')

The result of this single statement is that all the employees of department “C01” get a 5% raise, and their employee number (EMPNO), previous salary (OLD_SALARY generated INCLUDE column), and new salary (SALARY, result of the data change statement). Remember, when coding SELECT from UPDATE, either FINAL TABLE or OLD TABLE can be specified depending upon the business requirements.

SELECT FROM DELETE

A SELECT from DELETE will return the data that was removed from a table. This might be useful for reporting or archiving the removed data. In this example we are laying off all the employees in department “C01” after we realized they make two much money.

SELECT EMPNO, WORKDEPT, SALARY
FROM OLD TABLE (
DELETE FROM EMP3 
WHERE WORKDEPT = 'C01')

Of course, a report is needed so that we can inform them of our decision, so the result of this mass delete operation is returned in the outer SELECT from the intermediate table as specified in the data-change-table-reference. The employees are removed, and the report returned to the caller.

Performance

A SELECT from a data-change-table reference is important from a performance perspective in that it can eliminate calls to the data server, shorten units of work, and thus shorten the length of time that locks are held. So, a potential boost to performance and concurrency! Generated data via INCLUDE columns extend the capabilities via what is in essence programing within a SQL statement, which allows for the generation of reports from the result of the SQL data change, once again potentially eliminating multiple calls to the data server.

In this final dramatic example, we’re going to update the salary and bonus of the employees in our Db2 sample database. We going to set the salary and bonus or every employee that worked on projects owned by department “C01” to the maximum salary and bonus of all employees, and then return a report including employee number, old and new salary, and bonus. This could be a complicated application process with a relatively extended unit of work, or it could be a single SQL statement!

Wow! Pretty cool.

Coming in the Next Blog

In the next article I’ll talk about multi-row operations.

2 Comments

  1. Great great article Dan as usual! Wondering if you’re going to cover to common table expressions in the future? This would be an extension to the reference table in my opinion.

    • danadmin

      I’ll consider it, but my focus is performance so I’d have to be specific as to when it will be a performance benefit and when it might be a detriment. Thank you for the suggestion.

Leave Comment

Your email address will not be published. Required fields are marked *