Safe transactions are usually referred to as ACID.
Atomicity
All writes succeed or none of them do. Imagine having to withdraw 10 to Patrick. Both of them have to succeed.
Consistency
Changes made within a transaction are consistent with database constraints. All rules, constraints and triggers. If the data gets into an illegal state, the whole transaction fails.
Isolation
No race conditions. All transactions are executed independently of each other. If two transactions for a withdrawal of $100 run at the same time, at the end the balance should be -200.
Durability
Once the transaction completes and changes are written to the DB, they are persisted.
Write Ahead Log
We use a WAL to achieve atomicity, consistency and durability. Isolation, on the other hand, is hard and takes long to implement. Every transaction, we write all operations in a transaction, and at the end we write commit. Once we write commit, we know all the writes we put in the WAL are āsafeā. If the system were to shutdown, weād go through the WAL and replay all transactions to make sure everything is good.
Levels of Transaction Isolation
Read Committed Isolation
A database is said to be read committed isolated if it protects against these two race conditions:
- Dirty Write
- letās say two users try to write to the same row, in this order:
- user 1 sets the name as āpatrickā
- user 2 as āgeorgeā
- user 2 sets the address as āseattleā
- user 2 commits
- user 1 sets address as ālos angelesā
- user 1 commits
- and if thereās no type of thread safety, the result could be:
- name = āgeorgeā, address = ālos angelesā
- which is not what we want. to fix, we use a lock for every row.
- user has to acquire a lock before writing to it.
- letās say two users try to write to the same row, in this order:
- Dirty read
- Letās say this happens:
- thread 1 removes 10 from patrickās balance
- thread 2 (patrick) reads balance (-10!!)
- thread 1 pays patrick 10 bucks
- thread 1 commits
- thread 2 commits
- what happened? patrick saw a balance of -10, even though it shouldāve been 0. patrickās balance shouldnāt have updated UNTIL thread 1 committed.
- So how to fix? well, during a transaction, for every operation that happens, the thread actually updates a temporary value.
- when the thread commits, then the original value points to the temporary (new) value.
- Letās say this happens:
Snapshot Isolation
Long reads (more than a minute long) risk to see inconsistent data throughout the transaction. Letās say row 1 has 100 dollars, and row 100 has 100 dollars. During the reading, someone writes to the table (reads are non-blocking) and moves the 100th rowās money to the 1st row. The read transaction couldāve read 100 dollars for row 1 and then 0 dollars for row 100 since that value was changed by the write operation while the read was ongoing.
How do you fix this? Simple, reads read a snapshot of the data. Snapshots are implemented by storing old values of modified data. Every write, you keep it in the WAL with a certain timestamp. So basically for every row you will have a list of different values in time. If you start a read transaction at timestamp 15, you will only read values with a timestamp of 15 max. If a write happens at timestamp 20, you wonāt read that value.
Write Skew and Phantom Writes
Write Skew
Letās assume 2 different threads want to change their respective rowās status to āinactiveā only if 1 other row is active. Right now, theyāre the only 2 active rows.
If they both change themselves to inactive at the same time, thereāll be no active rows left! This is a problem. We fix this by using a lock. Threads acquire locks on ALL active rows, check if thereās an active one, then change their status, then release locks.
Phantom Writes
Similar example, letās assume 2 threads want to add a new row if thereās no other row with the same name in use. If both do at the same time, things break. But thereās no lock to acquire, how do we fix this? Well, we do materialize conflicts! This means, we insert the possible names as rows, with their own respective lock.
Threads just have to acquire a lock, check if the row is used or not, if itās not used, then use it.
Serializability
We saw how weak isolation is hard to debug and implemented differently among databases. Also, detecting race conditions is hard and difficult to tell looking at code.
Serializable isolation is regarded as the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.
ā¦
