Suppose you have a table full of tweets.

You would like a view that will show you the most recent tweet of every username, and you want it done fast and efficiently!  (Suppose that the table has millions of rows)

What’s the best way to get this information?  I don’t know.  But I think I’ve got a way that’s pretty fast.

Previously, I would’ve recommended doing something like this:

Which would’ve certainly gotten the job done.  But if you do an explain plan on the query, you can see that you’re having to read the whole table (full table scan), and then sort it (window sort pushed rank).

Is there a better way?  Yes.  The way I’m about to show you is most effective if the USERNAME column has a very low cardinality, but I’ve also seen where it’s very effective when cardinalities are lower.  Let’s start with an index.

So if you think about what an index is, for a second, an index (like a table) is a bunch of rows.   Unlike an ordinary heap table, though, an index is arranged in a sorted order to allow finding values to be done faster.  At the very end of each row in the index is a ROWID that tells which row in the table the index is pointing to.  So we’re interested in finding the “most recent” row, using our above index–how would we do that?  Let’s look at this query:

That looks like it will find the timestamps that we’re interested in.  And if you look at the plan you’ll notice that this query is being answered completely by the index–no table access is ever required, Oracle just uses the index for the above query.

So logically speaking, it seems like we ought to be able to trace through the tree of an index and eventually land on the row with the maximum timestamp for any given user…but once we’ve done that, how do we jump from the index to the table where the actual TWEET is?  The answer is, of course, by the ROWID.  The ROWID tells Oracle *exactly* where to look…what data file, which block, what row, to find the data.  ROWID is the fastest way to access a single row in a table.  If you look at the index we created, it has three “columns”….USERNAME, to_char(tweet_time, ‘YYYY-MM-DD HH24:MI:SS.FF’), and the ROWID.  So one way to get the ROWID of the maximum date for each user would be something like this:

Aha!  And, as you can see, we’re still only using the index at this point.

So how do we now jump from the index to the table in order to get the tweet?  Well, we know that ROWIDs are always 18 characters long.  So, we write our query like this:

Tah. Dah.  Now you see why we used a TO_CHAR around the date…because concatenating the date with the ROWID would necessitate that it be turned into a string anyway, and otherwise a funkly NLS_TIMESTAMP_FORMAT setting might distort the order of the table.

Let’s turn this query into a view and get the explain plan:

Notice that the cost of the new plan is 11, compared to the cost of the old ROW_NUMBER() plan of 17.  Bytes for the new plan is 174, bytes for the old plan was 1640.  You can imagine how the costs would scale out as the number of records in the table increased.