I read one of Tom Kyte’s books awhile back, and he mentioned that in Oracle, everything is done with read consistency. Everything from SELECTs, INSERTs, UPDATEs, to DELETEs…the whole 9 yards is done with a read consistency that looks at what the table looked like at the beginning of the query. But is this true for update statements, where you have to both read and modify the data?
Here’s a thought exercise: Suppose you have a very large table with a primary key (thus, a primary key index) on it.
You want to update the majority of the rows in the table that meet certain criteria, and the best…possibly even the only way to do that would be to full scan the table.
So you kick of this query, and at 12:00 noon, your Oracle server process starts at the beginning of the table, and starts scanning, locking, and updating records all the way through to the end, placing the undo data aside in an undo segment (let’s say the update will run a total of 5 minutes, till 12:05).
One minute goes by, when another update is kicked off at 12:01. This knows that it wants to update one (and only one) row, and it knows the primary key of the row it’s trying to update, so it’s able to jump onto the primary key index, and update instantly.
The thing is, the one row that the 12:01 update is trying to update…collides with one of the rows that the 12:00 update statement is trying to touch.
So what happens in this circumstance?
In my mind, for perfect read consistency to occur, the row that the 12:01 query is trying to update would be locked until the 12:00 query either committed or rolled back, as the current state of that row is technically “in flux.” What does Oracle actually do in this scenario? Let’s see.
I’m using Oracle 11.2 database version.
First, I’ll start by creating a table.
1 2 3 4 5 |
create table kaley.drop_me ( id number primary key , data varchar2(250) ); |
Then, I’ll load it with *lots* of data (well…lots for my little rinky-dink computer).
1 2 3 4 5 6 7 8 9 10 11 12 |
begin for v_index in 1 .. 2000000 loop insert into kaley.drop_me (id, data) values (v_index, case when mod(v_index, 10000) = 0 then rpad('Y', 250, 'Y') else rpad('X', 250, 'X') end); if mod(v_index, 10000) = 0 then commit; end if; end loop; commit; end; / |
Table will have 20 million rows, 0.01% of the rows will be ‘YYYYYYYYYY…..’ while the other 99.99% will be ‘XXXXXXXXXX….’ (just to give the table some padding, and make a full tablescan take a while.
I will have two session, each running unique scripts. Script “A,” which will be kicked off first, looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-- Script "A" set echo on timing on select to_char(systimestamp, 'YYYY-MM-DD HH24:MI:SS.FF') starttime, current_scn from v$database; update kaley.drop_me set data = 'FULL TABLE SCAN! Start time:' || to_char(systimestamp, 'YYYY-MM-DD HH24:MI:SS.FF') || ' SCN: ' || (select current_scn from v$database) where data like 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY%'; commit; select to_char(systimestamp, 'YYYY-MM-DD HH24:MI:SS.FF') stoptime, current_scn from v$database; |
…and script “B” which will be kicked off after script “A”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-- Script "B" set echo on timing on select to_char(systimestamp, 'YYYY-MM-DD HH24:MI:SS.FF') starttime, current_scn from v$database; update kaley.drop_me set data = 'INDEX! Start time:' || to_char(systimestamp, 'YYYY-MM-DD HH24:MI:SS.FF') || ' SCN: ' || (select current_scn from v$database) where id = 100000; commit; select to_char(systimestamp, 'YYYY-MM-DD HH24:MI:SS.FF') stoptime, current_scn from v$database; |
If we make a SELECT query by combining the two predicates of the table, we can see that the same row does, in fact, overlap both updates.
1 2 3 4 5 6 7 8 9 10 |
SQL> select count(*) from kaley.drop_me where id = 100000 and data like 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY%'; COUNT(*) ---------- 1 Elapsed: 00:00:00.008 |
So to prove this point, we’ll want to flush the buffer cache (never do this on a live production system) because we want our full table scan to run as long as possible.
1 2 3 4 5 |
SQL> alter system flush buffer_cache; System FLUSH altered. Elapsed: 00:00:06.330 |
Alright! We’re ready to roll. Let’s see what happens.
Script “A” – Full Table Scan | Script “B” – Index | ||||
|
|
You can see where script “B” was started about half a second after script “A” yet it returned immediately. Who won?
1 2 3 4 5 6 7 8 9 |
SQL> select data from kaley.drop_me where id = 100000; DATA -------------------------------------------------------------------------------- INDEX! Start time:2017-03-18 20:19:57.837000 SCN: 13568815 Elapsed: 00:00:00.008 |
Index was the clear winner for “claiming” the row, even though the full table scan should’ve had read consistency and started earlier.
But hold up. It gets weirder.
1 2 3 4 5 6 7 8 9 |
SQL> select distinct data from kaley.drop_me where data like 'FULL%'; DATA -------------------------------------------------------------------------------- FULL TABLE SCAN! Start time:2017-03-18 20:19:57.124000 SCN: 13568817 Elapsed: 00:00:00.370 |
The start time above is *before* the start time of the index. The SCN is *after.*
It’s true that committing changes the row of an SCN. But what really throws me for a loop here is that committing shouldn’t change a value in a column.
The only thing I can think of where this makes sense is this:
- Update query in script “A” runs for a bit.
- Update query in script “B” updates the row.
- Update query in script “A” reaches the row that “B” has updated, and goes “Dooooh fiddlesticks! My read consistency is ruined because this update has touched the row since I started my query.”
- The update goes back up to the start of the table and starts scanning again, updating rows with the new values.
If this is true, then in theory, you could keep “ping-ponging” the reader back up to the start of the table by making changes to rows it would’ve grabbed. Or, maybe Oracle made the query smart enough to go through and lock all the rows before it does the update, so that it gets ping-ponged a maximum of twice?
I’m not really sure on this one…