This is the second of several blog articles addressing Db2 performance. All the content of these articles will be based upon Db2 for z/OS. However, much of it can also be applied to Db2 for LUW. The articles will address mostly application performance regarding coding efficient SQL statements as well as efficient Db2 application coding techniques and settings. There is a saying in the industry, “it depends”, which is just another waying of saying “I don’t know”. While these articles will avoid using either of these terms, please keep in mind that your result can and will be different. With that in mind any design decision should be thoroughly tested prior to implementation. For testing references please see
https://www.db2expert.com/db2expert/testing-db2-design-decisions/
Call Only When Necessary
Yes, indeed the most efficient SQL statement is the one that is never executed. There are two things to remember about Db2. First, it’s a black box of sorts. That is, under the covers it is managing data in a vast number of physical datasets, maintaining the associated metadata that obscures the actual physical location and organization of the data. Thus, relieving the application developer of any knowledge of the physicality of the data they are accessing. All the developer needs to know is the tables and columns they need to access. That comes with an associated cost such that a database call will never be as cheap as a native file system call. Second, any call to Db2 from an application program (all calls to Db2 are from a program) is an external call from that application program. For example, if the call is in a CICS program or COBOL batch program to Db2 for z/OS located in the same z/OS LPAR it is considered a cross-memory call from one address space to another. If the call is coming from a remote application on another server, then that call is not only external from the Db2 distributed address space, but it involves traversing a network as well.
So, how can calls to the Db2 be reduced? There are a lot of things that can be done in this regard, and this is the single most important thing that an application developer can do to improve performance. You see, when a DBA sees an application with excessive “in Db2” times, they can investigate individual SQL statements that the application executes. If they see a few SQL statements using a lot of resources it can be an easy fix without application changes, such as adding an index or running statistics. However, if what they see is a whole lot of inexpensive statements executing then there is little a DBA can do to fix the issue. In this case it’s all on the application developer to reduce the number of calls.
Please see article 1 for call avoidance recommendations.
In this article I’m going to present using SQL joins with the objective of reducing calls to the Db2 data server.
Do Not Code Programmatic Joins
My very good friend Kurt Struyf and I were in California one time and ran into an employee from Google at a dive bar. When she learned what we did for a living she stated, “I love relational databases, will you please come to Google and teach my programmers how to code a join?” This was a remarkable moment because Google is a giant and one can assume that they know everything. However, there is a lot of truth in that request. Many application programmers simple learn basic SQL and as a result code very basic single SQL statements that end up being wrapped by application logic that is more complex than it needs to be. The result many times is an application that is more expensive from a database as well as an application perspective. Let’s look at a fictitious example using the Db2 sample database. Suppose there is a need to access employee information for Christine Haas and any projects that she is responsible for, if she is responsible for any at all. An application developer might code the following statements to get this information.
SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE, JOB, EDLEVEL, SEX, BIRTHDATE, SALARY, BONUS, COMM FROM DSN81210.EMP WHERE LASTNAME = 'HAAS' AND FIRSTNME = 'CHRISTINE';
One this query completes the program now has the employee number for Christine Haas and issues the following query.
SELECT PROJNO, PROJNAME, DEPTNO, RESPEMP, PRSTAFF, PRSTDATE, PRENDATE, MAJPROJ FROM DSN81210.PROJ WHERE RESPEMP = '000010';
Now the program has all the information for employee Christine Haas and the projects she is responsible for, if any, and returns that information to another service or an end user’s screen. All is perfectly well, and the application likely responds quickly, but this is in fact an expensive way to code this solution. In DBA terms we call this a “programmatic join” and consider it a relatively expensive way to join data together and there’s not a lot of tuning options we can offer. A programmatic join is just that, two individual database calls, perhaps across a network, plus some application code to combine the information that is returned. If an actual SQL join was coded instead the request can be satisfied in a single call to the database.
SELECT EMP.EMPNO, EMP.FIRSTNME, EMP.MIDINIT, EMP.LASTNAME, EMP.WORKDEPT, EMP.PHONENO, EMP.HIREDATE, EMP.JOB, EMP.EDLEVEL, EMP.SEX, EMP.BIRTHDATE, EMP.SALARY, EMP.BONUS, EMP.COMM, MAJPROJ.PROJNO, MAJPROJ.PROJNAME, MAJPROJ.DEPTNO, MAJPROJ.RESPEMP, MAJPROJ.PRSTAFF, MAJPROJ.PRSTDATE, MAJPROJ.PRENDATE, MAJPROJ.MAJPROJ FROM DSN81210.EMP AS EMP LEFT OUTER JOIN DSN81210.PROJ AS MAJPROJ ON EMP.EMPNO = MAJPROJ.RESPEMP WHERE EMP.LASTNAME = 'HAAS' AND EMP.FIRSTNME = 'CHRISTINE';
This delivers the same information as the two previous statements but in a single call to the Db2 data server. It’s coded as an outer join since there could possibly be no rows in the project table related to employee 000010, Christine Haas. This statement has two potential advantages over the two separate SQL statements: first, it is only one trip to the data server which saves database processing time (CPU and elapsed) as well as potential network time (one message instead of two), second the database engine has the greater chance of selecting a more efficient access path to the data. When compared to the two individual statements, Db2 has the ability with the single statement to evaluate multiple ways to access the data. This becomes a more important factor as the SQL includes more tables in a join and more complexity. I have tested a programmatic join solution versus a SQL join for two tables in an application program and the result was at least a 30% savings in Db2 elapsed time and CPU time for the SQL join over the programmatic join. The savings could be even greater if there are more tables involved.
While the application loses some logic by eliminating two separate calls to the database, there is additional responsibility in handling the potential for multiple rows from the join. So, while there might be a trade-off in application complexity there is still significant savings potential with the elimination of one call to the database.
What about modern application development using Data Access Objects (DAO) to facilitate the data access layer of an object-oriented design? In this situation we typically see direct mappings between database tables and application entities. Automated access methods against these objects basically eliminate the possibility of SQL joins. However, with minimal effort common access patterns in the application can be identified and the most common methods that access multiple tables could be retooled to access a Db2 view as opposed to multiple entities (table objects). For those who don’t know, a view is basically a SQL query that can be defined and installed into the database and given a name. Once it is defined the view can be accessed exactly as if it were a table. So, for our example in this article the following view could be established.
CREATE VIEW DSN81210.EMPPROJECT AS ( SELECT EMP.EMPNO, EMP.FIRSTNME, EMP.MIDINIT, EMP.LASTNAME, EMP.WORKDEPT, EMP.PHONENO, EMP.HIREDATE, EMP.JOB, EMP.EDLEVEL, EMP.SEX, EMP.BIRTHDATE, EMP.SALARY, EMP.BONUS, EMP.COMM, MAJPROJ.PROJNO, MAJPROJ.PROJNAME, MAJPROJ.DEPTNO, MAJPROJ.RESPEMP, MAJPROJ.PRSTAFF, MAJPROJ.PRSTDATE, MAJPROJ.PRENDATE, MAJPROJ.MAJPROJ FROM DSN81210.EMP AS EMP LEFT OUTER JOIN DSN81210.PROJ AS MAJPROJ ON EMP.EMPNO = MAJPROJ.RESPEMP);
Now, a DAO object can be created to map to the view, and the application can then access the view like any other table in the application. An underlying SQL statement to retrieve the same information as the above examples could look like this:
SELECT EMP.EMPNO, EMP.FIRSTNME, EMP.MIDINIT, EMP.LASTNAME, EMP.WORKDEPT, EMP.PHONENO, EMP.HIREDATE, EMP.JOB, EMP.EDLEVEL, EMP.SEX, EMP.BIRTHDATE, EMP.SALARY, EMP.BONUS, EMP.COMM, EMP.PROJNO, EMP.PROJNAME, EMP.DEPTNO, EMP.RESPEMP, EMP.PRSTAFF, EMP.PRSTDATE, EMP.PRENDATE, EMP.MAJPROJ FROM DSN81210.EMPPROJECT AS EMP WHERE EMP.LASTNAME = 'HAAS' AND EMP.FIRSTNME = 'CHRISTINE';
There is an issue, however, in that this view is non-updatable. In the case of updates to the object, the application could add some simple logic to use the objects related to the EMP and PROJ tables, or there is an option to use Db2 INSTEAD OF triggers to allow data change statements against the view. In short, the INSTEAD OF triggers allow for inserts, updates, and deletes against non-updateable views.
Coming in the Next Blog
In the next blog article, I’ll continue the discussion about programming techniques which basically will deliver multiple pieces of information in single SQL calls, keeping the focus on reducing calls to the database.