The question came in today in an Orapub Slack conversation and I nearly exploded from excitement.  Below was my response.

The only way to know, as @Antonio Sant alluded to, is to answer the question “does a better plan exist?”

When Oracle figures out which plan it wants to take, it goes through the process of trying out several different variations of join order, access methods, query transformations, and basically asks the question over and over again “is this new plan faster than any other plan I’ve come up with so far?” (or rather, the do I anticipate this new plan to be faster than any other plan? Reality can differ from expectations if the optimizer doesn’t have a clear picture of what’s going on, or makes an incorrect assumption based on extrapolation). So literally, the very method that Oracle uses to come up with an execution plan is based on trying various possibilities until it comes up with the best one. As humans, we can look at a plan and have some guided creativity and apply some various principles, but the job is essentially the same.

Things that are important to an efficient query:

First, is the query well written? This is the very first thing you want to check, and the webinar next week is all about ensuring that you have well-written queries (I’d encourage you to come!). There are certain thing you want to avoid. Some of these are common sense–don’t join to tables you don’t need. Make sure you’re using bind variables where they’re appropriate and not using them when they’re unnecessary. Don’t select more columns than you need. Don’t introduce any unnecessary sorts (if you don’t need an ORDER BY, don’t specify one; if a UNION ALL will do just fine, then don’t use a UNION so that Oracle isn’t wasting resources trying to remove duplicates where there are none; etc). Next week’s seminar focuses on a lot of things that maybe aren’t as “common sense” such as where and how to apply filters, how to take advantage of subqueries, etc. (I have to stop here or I risk giving my content away). The way that your query is written will determine your query’s maximum level of efficiency. In other words, if you have a well-written query, it can have either a good execution plan, or a bad execution plan. If you have a poorly-written query, you will only have inefficient execution plans…you may be able to speed the query up some using indexes or changing the plan with hints/profiles/baselines, etc. But you will be crippled on how far you can increase the efficiency by how the SQL text is phrased. So if you want the best performance, start by examining the query first (This is the step that most resources that talk about query-tuning totally never talk about, by the way).

Second, pay attention to the execution plan. The execution plan gives you information on the following items, and you’ll want to address each of them in the order I’ve specified. Paying attention to bottlenecks (which step is taking the longest and what wait events does it have) can guide your area of focus, but know that just because a particular step is taking the longest, that doesn’t necessarily mean that’s where the problem is. But, it’s a great place to start looking! A SQL Monitor report will help you find bottlenecks and bottlenecked wait events…or if your query is too big/too complex for a monitor report, you can use ASH to find the same information.

Join Order – It doesn’t matter how many tables you specify in your query, Oracle will only join two of them together at any one time. So if you have a query that joins tables A, B, C, D, and E together, Oracle may choose to first join A and B together, then it may join the result of that with C, then it may join the result of that with D, and so on. So Oracle is generating intermediate result sets as a result of these joins (it’s not always a simple and one-right-after-the-other as what I’ve described above, but this is the idea when it’s boiled down to it’s simplest, most fundamental concept). So one of the first things to examine when looking at an execution plan is generally, starting with your most filtered tables first will tend to produce the smallest intermediate result set. The smaller the intermediate result sets are, the less data that is involved in joins during the creation of the intermediate result sets, so the query speeds up. Dan Towe has written an entire book focused on how to ensure that you get the right join order (“SQL Tuning” in the O’Reilly book series). Well worth the read if you want to get any kind of serious about SQL tuning.

Access methods – Next you’ll want to pay attention to how you’re accessing the data. Do you have the right indexes in place? Something to be cautious of is indexes are not always faster than full table scans. There’s a percentage of rows threshold on a table where indexes start to “lose their magic.” The magic threshold varies in different situations. Some say it’s around 10% of the rows in a table. I think Kevin Mead says around 2%. Carlos Sierra says “1%” and “less than 1% if you’re on Exadata” (which I think is a bit extreme). Johnathan Lewis has staged situations that are juuuuuuuuuust right where indexes are still effective around 30%. It depends on lots of things like how well the table is organized in comparison with the index, whether or not we can fulfill our query by “just using the index” or whether we need to use both the index and the table, how many “layers” our index has (an index is a tree-like structure), what kind of filtering can I expect to use on the index, etc. Honestly, any “rules of thumb” on indexes are waaaay too broad, and your mileage will vary drastically. What’s true in one scenario is absolutely false in another. Other access considerations are partitioning and subpartitioning, using clusters, materialized view rewrites, exadata zone maps…everything that you read about that’s associated with “how do I retreive my data?” falls into this category.

If you have:

  • a well-phrased sql query that efficiently answers the business need
  • and an excellent join order
  • and all the right access methods in place

then you will likely have a great execution plan

If Oracle keeps picking bad execution plans, its likely because of one of two things:

  • Oracle doesn’t have a full understanding of how many rows it should expect to retreive from a given database object (probably 80-97% of the time)
  • Oracle doesn’t have a full understanding of how long an operation will take (anywhere from 3-20% of the time)

If Oracle doesn’t understand how many rows it should expect to get back, it could be due to:

  • Stale or missing or incomplete statistics
  • Data skew (Oracle assumes that data distributions are even, unless it has additional help through things like histograms)
  • Data correlations (Oracle will assume that data filters are uncorrelated, unless given additional help through things like extended statistics)
  • Guesses (Sometimes Oracle is totally unable to know how many rows to expect back…e.g. WHERE SOME_ARBITRARY_COLUMN = :MY_BIND_VARIABLE, or complex predicates, using LIKE…Oracle literally has to guess with these situations)
  • Unlucky dynamic sampling (Oracle tries to gather a “mini” set of statistics on the fly…many times, this is the best option for, say, global temporary tables where the contents are super volatile and change frequently)

Likewise, if Oracle doesn’t have a full understanding of how long an operation should take, it could be due to lack of workload statistics or stale system statistics, or not the right kind of system statistics.

If you have a great query, a great execution plan, and your query is still slow, then you might ask the question

Can you effectively use parallelism? Certain database operations do not parallelize well…for example, it’s difficult to parallelize a (non-partitioned) index range scan. Also, there are certain overheads associated with parallelism that may make it “not worth it” to use parallelism. There’s overhead associated with creating the various parallel workers and giving them their tasks. There’s overhead associated with tearing the parallel workers down. There’s overhead associated with making sure that the various parallel workers aren’t stepping on eachother’s toes. Parallelism should be applied last though…not first. Otherwise, you risk running a really inefficient query in parallel, and all you’re going to end up doing is burning up your precious CPU, disk I/O, and memory resources.

If you get this far and you still have slow SQL, your problem is no longer a bad plan. At this point you start delving into other tools and seeing where bottlenecks are, and trying to answer the question of “what’s using all my <insert bottlenecked resource here, whether CPU, disk IO, memory, networking, etc>?”