When tuning SQL, one of the first things I like to check in an execution plan is to verify that I have the correct join order.

Join Order is considered by many to be one of the fastest and most reliable ways to tune SQL.  Dan Tow, one of the pioneering innovators in SQL tuning wrote a book that is considered by many to be the foundational book on tuning SQL–the entire book focuses on join order.  The main concept that Tow conveys in his book is you want to start with your most filtered table first.  Allow me to explain a peculiar situation.  I have the following query:

TABLE_A has 1,000 rows, and column “Q” is unique with no nulls.  So the filter Q = 1  gives us a selectivity of 0.1% (hey! that’s pretty good)

TABLE_C has 10 rows, and column “S” is unique with no nulls.  So the filter  S between 2 and 6  will return 5 out of the 10 total rows, giving us a selectivity of 50% (not such a great filter).

TABLE_B has no filters.  TABLE_B is a child of both TABLE_A, and TABLE_C.   TABLE_B has 100,000,000 rows (or thereabouts).

One of the things that Tow encourages us to do is to graph the query.  Here’s what the graph looks like:

(Pardon the ASCII art, I’ve still not got all the bugs worked out since my migration for uploading pictures yet).

The following will help simplify things, in case you’ve already read Dan Tow’s excellent book:

  • In his book, Dan Tow talks about “hidden filters” where you lose rows when joining from one table to another.  There are no hidden filters in this example.
  • Dan Tow talks about special ways to handle tables that return only one row.  True, TABLE_A returns only one row, but let’s ignore that concept purely for the sake of simplification.

By all accounts, the “right” join order here would be:

  • TABLE_A first (since it has the strongest filter,…only 0.1% of the records are returned)
  • Then TABLE_B, since that’s the only table that joins directly to TABLE_A
  • Then TABLE_C, since that’s the only table remaining of the 3.

But! If we run the following query that forces the “right” join order:

It consistently runs for about 32 seconds.

If, however, we run it in the opposite join order,

It runs. around 13-14 seconds.

How is this possible?  Scroll down for the answer below.

(You may want to think about it for a bit before you scroll down and read the answer, see if you can figure it out for yourself if you haven’t already.  Think about why you would want to join starting with TABLE_A, think about why the concept of joining with the most filtered table first “works”).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ANSWER: Extreme Data skew.

So in a SQL query, the database can only join 2  tables together at any given point in time.  Usually, there is some form of “join uniformity” at play.

Suppose we have 2 tables….CUSTOMERS, and ORDERS.

Let’s say we have 1000 customers, and each customer places, on average, 100 orders.  This means CUSTOMERS has 1000 rows, and ORDERS has 100,000 rows.

Under most situations, you would assume the number of ORDERS to be somewhat evenly distributed…at least (very) roughly.  For example, one customer makes 100 orders….maybe one here has 50 orders, and another there has 150 orders….perhaps two others have 70 orders and 130 orders respectively.  If you’re doing an inner join between CUSTOMERS and ORDERS…but you have a filter that eliminates 50% of the customers, then you would expect that approximately 50% of the orders would be eliminated as well.

So in order to answer the query CUSTOMERS joined with ORDERS, you would expect that result to be pared down by approximately 50% of the amount of data.

NOW, let’s propose a similar, but slightly different situation.  Say you have 1000 customers, and you still have 100,000 orders.  The majority of customers only place 1 or two orders (say the average customer makes an average one and a half orders).  But you have one customer that comprises the overwhelming majority of your orders, and this “favorite” customer has made 98,500 orders.  Suppose this “favorite” customer has a CUST_ID of 1234.

If you were to put a filter on the CUSTOMERS table: “cust_id = 1234” then that looks like a really strong filter on the CUSTOMERS table…and it is!  Because you’ve eliminated 999 other customers, you only care about 1.  However, when you join to the ORDERS table, due to data skew, your filter from the CUSTOMERS table hasn’t really “cascaded” to the CUSTOMERS table much.  In most cases, there’s some sense of join uniformity helping you out, but in this extreme data skew case, the result of joining CUSTOMERS with ORDERS has only eliminated 1,500 out of 100,000 orders…so you’re still left with 98,500 rows (or 98.5%) of your biggest table.

In my 3-table query above, TABLE_B is heavily skewed so that the one record in TABLE_A that corresponds with the filter Q = 1  joins with over 99% of the 100 milion records in TABLE_B.

So when Oracle runs a query with multiple tables in it, Oracle works by joining two tables together, then taking the result of that and joining it with a third table, taking the result of that joining with a fourth table…so on, so forth.  It may not always be one-right-after-the-other as I’ve described, as a lot of it can be “pipelined” but the general concept is there.  When Oracle ran the query that joined TABLE_A to TABLE_B then to TABLE_C, it joined TABLE_A to TABLE_B first….and took the intermediate result set of that….and joined it to TABLE_C.  Well, the intermediate result set of TABLE_A joining with TABLE_B is what hurt us here.  TABLE_B is far and above, our biggest table (TABLE_A has 1000 rows, TABLE_C has 10 rows, TABLE_B has 100 million).  Joining TABLE_A to TABLE_B left us with an intermediate result set that comprised over 99% of the row count of our biggest table, so due to data skew, our filter wasn’t as good as it might have seemed.

When the intermediate result set was joined with TABLE_C, though, TABLE_C was truly uniformly joined…so the filter on TABLE_C did actually eliminate about 50% of the records….which explains why joining TABLE_C to TABLE_B then to TABLE_A was so much faster…joining TABLE_C and TABLE_B gave us an intermediate result set that was about half the size of TABLE_B, which then only had to be joined with 1000 records in TABLE_A.

 

So you think “data skew is bad, it causes your record counts to potentially blow out.”  And that’s true, but the opposite is also true.  Think about, for example, if I had changed my filter on TABLE_A to q <> 1  then it would initially *appear* to be a pretty terrible filter, since you’re only eliminating 0.1% of the records, leaving 99.9% of the records in the table to pass through…but joining the result of that filter to our heavily-skewed TABLE_B would eliminate more than 99% of your largest table when joining an intermediate result set.  So data skew can hurt your query, or it can help your query.  Really, one of the best ways to tune the query is to ensure that the CBO has an accurate picture of what’s going on and let it do it’s job.

Oh, btw, the CBO assumes join uniformity by default as well! And unless we give it some extra help, it wants to join in the order of TABLE_A -> TABLE_B -> TABLE_C.

How do we fix the above situation?  That will likely be a future blog post–this one is already too long.  Thanks for reading!