Built byPhoenix

© 2026 Phoenix

← Blog
DatabasesPostgreSQLConcurrencyTransactionsSQLBackend

Race Conditions and the Lies of SELECT: A Guide to Transactions & Isolation Levels

Phoenix·May 14, 2026·16 min read

Race Conditions and the Lies of SELECT: A Guide to Transactions & Isolation Levels

The incident started the way the memorable ones always do: with a support ticket and a number that should have been impossible. A customer's wallet showed a balance of -1,840 units. We did not allow negative balances. There was a CHECK constraint. There was application code that explicitly refused withdrawals larger than the balance. And yet there it was, glowing red on the dashboard.

What happened was a lost update, and it is the single most common concurrency bug I see in production backends. Two withdrawal requests for the same account arrived in the same handful of milliseconds. Both read the balance: 2,000. Both checked "is 2,000 at least 1,840?" — yes. Both subtracted and wrote 160. Two withdrawals of 1,840 succeeded against an account that could only fund one. The second write did not subtract from the first; it overwrote it. We handed out 1,840 units of someone else's money because two threads agreed on a stale truth.

The terrifying part is that every line of that code passed review. It read the balance, it checked the balance, it updated the balance. The bug was not in any one statement. The bug was the gap between the statements, and the assumption that a SELECT tells you something that stays true long enough to act on.

It doesn't. That is what this post is about.


ACID, and the two letters that matter here

Relational databases sell us ACID: Atomicity, Consistency, Isolation, Durability. For concurrency, two of those letters carry the weight. Atomicity means a transaction is all-or-nothing — every statement between BEGIN and COMMIT commits together or none of it does. Isolation means concurrent transactions should not be able to observe each other's half-finished work, ideally behaving as if they ran one after another. The lie most engineers believe is that wrapping statements in BEGIN/COMMIT buys them full isolation. It does not. Isolation is a dial, the default setting is weaker than you think, and atomicity alone will not save you from a race.


The read-modify-write race, in code

Here is the withdrawal handler, distilled. It looks fine. It is not fine.

typescript
// DO NOT DO THISasync function withdraw(accountId: string, amount: number) {  const { rows } = await db.query(    'SELECT balance FROM accounts WHERE id = $1',    [accountId]  )  const balance = rows[0].balance
  if (balance < amount) throw new Error('Insufficient funds')
  await db.query(    'UPDATE accounts SET balance = $1 WHERE id = $2',    [balance - amount, accountId]  )}

The naive instinct when you learn about this bug is "ah, it is not transactional — wrap it in BEGIN/COMMIT." So you do:

sql
BEGIN;SELECT balance FROM accounts WHERE id = 1;   -- read-- app decides amount is OKUPDATE accounts SET balance = 160 WHERE id = 1;COMMIT;

And the bug is still there. Under Postgres's default isolation level, Read Committed, this changes nothing about the race. Here is why, as a two-session timeline:

sql
-- session A                              -- session BBEGIN;                                    BEGIN;SELECT balance FROM accounts  WHERE id = 1;          -- reads 2000                                          SELECT balance FROM accounts                                            WHERE id = 1;    -- also reads 2000UPDATE accounts SET balance = 160  WHERE id = 1;COMMIT;                                          UPDATE accounts SET balance = 160                                            WHERE id = 1;    -- overwrites A                                          COMMIT;

A plain SELECT does not take a lock. Both transactions read 2,000 because neither has committed a write the other can see. BEGIN/COMMIT gave you atomicity (each pair of statements commits together) and durability (it survives a crash). It did not give you mutual exclusion over the value you read. The two transactions never collide on a lock, so nothing stops them from making decisions on the same stale number.


The anomalies, one line each

Before the fixes, it helps to name the ways concurrency can lie to you. These are the classic phenomena the SQL standard talks about:

  • Dirty read — you read a row another transaction has written but not yet committed; if it rolls back, you acted on data that never existed.
  • Non-repeatable read — you read a row, someone else updates and commits, you read the same row again in the same transaction and get a different value.
  • Phantom read — you run the same WHERE query twice; the second run returns new rows because another transaction inserted matching ones.
  • Lost update — two transactions read-modify-write the same row; one write silently clobbers the other. (Our war story.)
  • Write skew — two transactions read an overlapping set, make disjoint writes based on what they read, and together break an invariant neither broke alone. Classic example: two on-call doctors each check "is at least one other doctor on call?", both see yes, and both go off duty — leaving zero.

The four isolation levels (and what Postgres actually does)

The SQL standard defines four isolation levels in terms of which anomalies they forbid. But the standard describes a minimum; engines are free to be stricter, and Postgres is. Everything below is Postgres-specific — MySQL/InnoDB, for instance, implements Repeatable Read with gap locks and behaves quite differently, so do not port these mental models blindly.

LevelDirty readNon-repeatablePhantomLost updateWrite skew
Read Uncommittedprevented*possiblepossiblepossiblepossible
Read Committed (default)preventedpossiblepossiblepossiblepossible
Repeatable Readpreventedpreventedpreventedpreventedpossible
Serializablepreventedpreventedpreventedpreventedprevented

The asterisks and footnotes are where the real understanding lives:

  • Read Uncommitted — the standard permits dirty reads here. Postgres does not implement it as a distinct mode: ask for Read Uncommitted and you silently get Read Committed. Thanks to MVCC, Postgres never shows you uncommitted data, at any level. Dirty reads are simply not a thing here.
  • Read Committed — the default. Each statement sees a snapshot taken at the moment that statement began. No dirty reads. But two statements in the same transaction can see different committed data, so non-repeatable reads, phantoms, lost updates, and write skew are all possible. This is where the overwhelming majority of lost-update bugs live, including ours.
  • Repeatable Read — in Postgres this is snapshot isolation. The entire transaction sees one snapshot frozen at its first query. No non-repeatable reads, and — going beyond the standard — no phantom reads either, because your view of the world never changes. It also kills the simple lost update: if you try to UPDATE a row that another transaction has updated and committed since your snapshot, Postgres aborts you with could not serialize access due to concurrent update (SQLSTATE 40001). What it does not catch is write skew — that is snapshot isolation's blind spot, because the two transactions touch different rows.
  • Serializable — implemented as SSI (Serializable Snapshot Isolation). It layers predicate tracking on top of snapshot isolation to detect the read/write dependency cycles that cause write skew, and aborts one transaction in the cycle with 40001 (could not serialize access due to read/write dependencies among transactions). The guarantee is strong and clean: any set of transactions that commit produce a result equivalent to having run them one at a time in some order. The price is that you must be ready to retry 40001.

The Postgres gotcha worth tattooing somewhere: both Repeatable Read and Serializable can throw 40001. If your application does not catch and retry it, you have merely traded a silent data-corruption bug for a loud HTTP 500. That is still a strict improvement — a visible failure beats invisible money loss — but only retries make it correct.


Fixing the race, three ways

1. Pessimistic locking: lock the row before you touch it

Tell the database you intend to modify this row, and make everyone else wait:

sql
BEGIN;SELECT balance FROM accounts WHERE id = $1 FOR UPDATE;-- the row is now locked; any other FOR UPDATE / UPDATE / DELETE on it blocksUPDATE accounts SET balance = balance - $2 WHERE id = $1;COMMIT;

FOR UPDATE takes a row-level exclusive lock. The second transaction's SELECT ... FOR UPDATE blocks until the first commits, then reads the fresh balance. Access is serialized, the lost update is gone, and this works at the default Read Committed. The lock modes are worth knowing:

  • FOR UPDATE — strongest row lock; use when you are about to update or delete the row.
  • FOR NO KEY UPDATE — slightly weaker; it does not block FOR KEY SHARE, which foreign-key checks acquire. A plain UPDATE that does not touch a key column takes this automatically. Prefer it when you are not changing keys, to reduce contention with FK enforcement.
  • FOR SHARE / FOR KEY SHARE — shared locks for "let me read this and guarantee nobody changes it under me", while still allowing other readers.
  • SKIP LOCKED — do not wait; skip rows already locked by someone else. This is the killer feature for work queues.
  • NOWAIT — fail immediately with an error instead of blocking.

That queue pattern is worth showing, because it is everywhere — job runners, outbox processors, schedulers:

sql
SELECT id, payload FROM jobsWHERE status = 'pending'ORDER BY created_atFOR UPDATE SKIP LOCKEDLIMIT 1;

Ten workers can run this concurrently and each grabs a different job, because each skips the rows its peers have locked. No external queue, no lock manager — just Postgres.

The trade-off with pessimistic locks: they serialize access (that is the point) and so can become a contention bottleneck, and they are a primary source of deadlocks. Hold them for as little wall-clock time as possible.

2. Optimistic locking: a version column

Assume conflicts are rare. Read without locking, and at write time verify nothing changed underneath you:

sql
-- read, no lockSELECT balance, version FROM accounts WHERE id = $1;
-- write only if the version is still what we readUPDATE accountsSET balance = $1, version = version + 1WHERE id = $2 AND version = $3;

The trick is to inspect the affected-row count. If zero rows were updated, someone bumped the version between your read and your write — you lost the race, so reload and retry the whole operation:

typescript
const res = await db.query(  'UPDATE accounts SET balance = $1, version = version + 1 WHERE id = $2 AND version = $3',  [newBalance, accountId, expectedVersion])
if (res.rowCount === 0) {  // conflict: another writer won. Reload state and retry from the top.}

Reach for this when contention is low, reads dominate, or you cannot hold a transaction and a database connection open across human "think time" — an edit form a user stares at for two minutes, say. No locks are held between read and write; the cost is wasted work when a conflict does happen. It is also the pattern most ORMs (Hibernate, Prisma's optimistic concurrency, etc.) implement for you.

3. Do it in one statement: let the database do the arithmetic

My default, and the one I reach for first. Most read-modify-write logic can be folded into a single conditional UPDATE:

sql
UPDATE accountsSET balance = balance - $1WHERE id = $2 AND balance >= $1;

There is no read-modify-write window here, because the read, the check, and the write all happen atomically inside one statement, under the row lock that UPDATE already takes. Inspect rowCount: zero means the balance >= $1 guard failed — insufficient funds — and the balance was never at risk of going negative. No version column, no explicit FOR UPDATE, no extra round trip.

The subtle reason this is safe even at Read Committed is worth internalizing: when an UPDATE collides with a concurrently-committed change to the same row, Postgres does not blindly use your statement's snapshot. It rechecks — it re-reads the latest committed version of the row and re-evaluates the WHERE clause against it (the EvalPlanQual mechanism). So the second withdrawal waits for the first to commit, then re-tests balance >= $1 against the new balance of 160, fails the guard, and updates nothing. If you can express your mutation as one conditional UPDATE, do that before reaching for anything cleverer.


Deadlocks: the price of locking

Locks fix races but introduce a new failure mode. A deadlock is a cycle of waiting: transaction A holds a lock on row 1 and wants row 2; transaction B holds row 2 and wants row 1. Neither can proceed, and neither will let go.

sql
-- session A                              -- session BBEGIN;                                    BEGIN;UPDATE accounts SET ... WHERE id = 1;     UPDATE accounts SET ... WHERE id = 2;UPDATE accounts SET ... WHERE id = 2;     UPDATE accounts SET ... WHERE id = 1;-- waits for B's lock on row 2            -- waits for A's lock on row 1--                  ^^^^^^^^^^ deadlock ^^^^^^^^^^

Postgres does not hang forever. A waiting transaction, after deadlock_timeout (1 second by default), runs deadlock detection; if it finds a cycle it picks a victim, aborts it with deadlock detected (SQLSTATE 40P01), rolls back that transaction entirely, and lets the survivor continue. How to keep it from happening:

  • Consistent lock ordering. If every transaction acquires locks in the same order — always ascending by id, so a transfer locks LEAST(from, to) first — a cycle is impossible. This is by far the most effective fix.
  • Keep transactions short. The less time you hold a lock, the smaller the window for a cycle to form.
  • Touch fewer rows, and use the weakest lock that works (FOR NO KEY UPDATE over FOR UPDATE when you are not changing keys).
  • Retry the victim. 40P01 is transient and retryable, exactly like 40001.

A retry helper for 40001 and 40P01

Serialization failures (40001) and deadlocks (40P01) are both transient: the transaction rolled back cleanly, so retrying is sound as long as the work has no non-transactional side effects. Here is the wrapper I reach for, with exponential backoff and full jitter so a thundering herd of retries does not synchronize:

typescript
import { Pool, type PoolClient } from 'pg'
const pool = new Pool()
// 40001 = serialization_failure, 40P01 = deadlock_detectedconst RETRYABLE = new Set(['40001', '40P01'])
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))
export async function inTransaction<T>(  work: (tx: PoolClient) => Promise<T>,  { isolation = 'READ COMMITTED', maxAttempts = 5 } = {}): Promise<T> {  for (let attempt = 1; ; attempt++) {    const tx = await pool.connect()    try {      await tx.query('BEGIN ISOLATION LEVEL ' + isolation)      const result = await work(tx)      await tx.query('COMMIT')      return result    } catch (err) {      await tx.query('ROLLBACK').catch(() => {})      const code = (err as { code?: string }).code
      if (code && RETRYABLE.has(code) && attempt < maxAttempts) {        const cap = Math.min(1000, 25 * 2 ** attempt) // ms ceiling        await sleep(Math.random() * cap) // full jitter        continue      }      throw err    } finally {      tx.release()    }  }}

And the call site, opting into Serializable for an operation that needs it:

typescript
await inTransaction(  async (tx) => {    await tx.query(      'UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1',      [amount, accountId]    )  },  { isolation: 'SERIALIZABLE' })

Two mistakes I see constantly: retrying forever (always cap attempts and surface a clean error when you give up), and retrying work that has non-idempotent side effects. Which leads to the rules.


Practical guidance

  • Stay on Read Committed by default. Make your writes safe with single-statement conditional updates or SELECT ... FOR UPDATE, rather than reaching for a global isolation bump. Most services never need to change the level.
  • Reach for Repeatable Read when one transaction reads the same data multiple times and must see a consistent picture — financial reports, multi-query aggregations, "read your own computation" flows.
  • Reach for Serializable when correctness hinges on an invariant spanning rows that no single lock protects: booking systems, "at most N of X", balances split across rows. It is the easiest correct option for write skew; you pay for it with the retry loop above.
  • Keep transactions short. Open late, commit early. An open transaction pins a connection, holds its locks, and keeps an MVCC snapshot alive that blocks VACUUM from cleaning up dead tuples — long transactions bloat your tables.
  • Never make network calls inside a transaction. No HTTP to a payment processor, no publishing to Kafka, no waiting on user input. You would pin a connection and hold locks across unbounded latency — and if you then retry on 40001, you re-fire that side effect. Do external work before or after the transaction, or use the transactional outbox pattern: write an intent row in the transaction, dispatch it from a separate worker.
  • Make retried work idempotent. If a transaction can run twice, ensure running it twice is harmless — idempotency keys, natural dedupe, INSERT ... ON CONFLICT DO NOTHING.

Wrap-up

The bug at the top of this post was not a Postgres bug. Postgres did precisely what Read Committed promises. The bug was mine — assuming a SELECT reports the truth and that the truth holds still long enough to act on. It does not. By the time your UPDATE runs, the value you read may already be a lie.

Once that clicks, the fixes are mechanical. Do the mutation atomically in one conditional statement. Lock the row you intend to change. Or raise the isolation level until the database catches the conflict for you — and retry when it does. Reach for the cheapest tool that makes the race genuinely impossible, and never let a SELECT make a decision it has no power to enforce.


Further reading

  • PostgreSQL documentation — "Transaction Isolation" (Concurrency Control chapter). The canonical, version-accurate reference for exactly what each level does in Postgres, including SSI.
  • Martin Kleppmann, Designing Data-Intensive Applications, Chapter 7 ("Transactions"). The clearest treatment of weak isolation, lost updates, and write skew I know of.
  • The Jepsen analyses (jepsen.io). What isolation guarantees databases actually deliver under partition and load, versus what their marketing claims.
← All postsShare on X
// DO NOT DO THISasync function withdraw(accountId: string, amount: number) {  const { rows } = await db.query(    'SELECT balance FROM accounts WHERE id = $1',    [accountId]  )  const balance = rows[0].balance
  if (balance < amount) throw new Error('Insufficient funds')
  await db.query(    'UPDATE accounts SET balance = $1 WHERE id = $2',    [balance - amount, accountId]  )}
BEGIN;SELECT balance FROM accounts WHERE id = 1;   -- read-- app decides amount is OKUPDATE accounts SET balance = 160 WHERE id = 1;COMMIT;
-- session A                              -- session BBEGIN;                                    BEGIN;SELECT balance FROM accounts  WHERE id = 1;          -- reads 2000                                          SELECT balance FROM accounts                                            WHERE id = 1;    -- also reads 2000UPDATE accounts SET balance = 160  WHERE id = 1;COMMIT;                                          UPDATE accounts SET balance = 160                                            WHERE id = 1;    -- overwrites A                                          COMMIT;
BEGIN;SELECT balance FROM accounts WHERE id = $1 FOR UPDATE;-- the row is now locked; any other FOR UPDATE / UPDATE / DELETE on it blocksUPDATE accounts SET balance = balance - $2 WHERE id = $1;COMMIT;
SELECT id, payload FROM jobsWHERE status = 'pending'ORDER BY created_atFOR UPDATE SKIP LOCKEDLIMIT 1;
-- read, no lockSELECT balance, version FROM accounts WHERE id = $1;
-- write only if the version is still what we readUPDATE accountsSET balance = $1, version = version + 1WHERE id = $2 AND version = $3;
const res = await db.query(  'UPDATE accounts SET balance = $1, version = version + 1 WHERE id = $2 AND version = $3',  [newBalance, accountId, expectedVersion])
if (res.rowCount === 0) {  // conflict: another writer won. Reload state and retry from the top.}
UPDATE accountsSET balance = balance - $1WHERE id = $2 AND balance >= $1;
-- session A                              -- session BBEGIN;                                    BEGIN;UPDATE accounts SET ... WHERE id = 1;     UPDATE accounts SET ... WHERE id = 2;UPDATE accounts SET ... WHERE id = 2;     UPDATE accounts SET ... WHERE id = 1;-- waits for B's lock on row 2            -- waits for A's lock on row 1--                  ^^^^^^^^^^ deadlock ^^^^^^^^^^
import { Pool, type PoolClient } from 'pg'
const pool = new Pool()
// 40001 = serialization_failure, 40P01 = deadlock_detectedconst RETRYABLE = new Set(['40001', '40P01'])
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))
export async function inTransaction<T>(  work: (tx: PoolClient) => Promise<T>,  { isolation = 'READ COMMITTED', maxAttempts = 5 } = {}): Promise<T> {  for (let attempt = 1; ; attempt++) {    const tx = await pool.connect()    try {      await tx.query('BEGIN ISOLATION LEVEL ' + isolation)      const result = await work(tx)      await tx.query('COMMIT')      return result    } catch (err) {      await tx.query('ROLLBACK').catch(() => {})      const code = (err as { code?: string }).code
      if (code && RETRYABLE.has(code) && attempt < maxAttempts) {        const cap = Math.min(1000, 25 * 2 ** attempt) // ms ceiling        await sleep(Math.random() * cap) // full jitter        continue      }      throw err    } finally {      tx.release()    }  }}
await inTransaction(  async (tx) => {    await tx.query(      'UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1',      [amount, accountId]    )  },  { isolation: 'SERIALIZABLE' })