The Index You're Missing: A Practical Guide to Postgres Performance
A few years ago I got paged at 2am because an internal dashboard had stopped loading. Nothing had been deployed; no config had changed. The query behind it had run unchanged for two years. It used to take 40ms. Now it took 12 seconds, and under load it piled up connections until the pool starved and the whole API fell over.
The cause was boring, which is to say it was the usual cause. The transactions table had quietly grown from 50,000 rows to 2 million. The query filtered on account_id, and there was no index on it. At 50,000 rows, scanning the whole table was fast enough that nobody noticed. At 2 million, that same full scan was a wall.
One CREATE INDEX took the query from 12 seconds back to under 2ms. That's the entire value proposition here. Indexing is the highest-leverage skill a backend engineer can have for database performance, and most of the slow queries I've debugged in production weren't a hardware problem or a "we need to shard" problem — they were a missing index, a wrong index, or an index the planner couldn't use. Here's how I think about it.
Seq Scan vs Index Scan: What Actually Happens
When you run a query like this with no useful index:
Postgres has exactly one option: read every row in the table and check whether customer_id = 4217. That's a sequential scan, and the work is proportional to the whole table, not the result. Finding 14 rows in a 2-million-row table means reading all 2 million.
An index changes the shape of that work. It's a separate, ordered structure mapping column values to the physical location (the tuple ID, or ctid) of matching rows on disk. Instead of reading the whole table, Postgres walks the index to the handful of locations it cares about and jumps straight there. The work becomes proportional to the rows you actually want, plus a logarithmic search cost — the difference between 12 seconds and 2ms.
B-tree: The Default, and the One You'll Use 95% of the Time
When you write CREATE INDEX with no type specified, you get a B-tree:
A B-tree keeps values sorted in a balanced tree, which makes it good at a specific, very common set of operations:
- Equality:
WHERE customer_id = 4217 - Range:
WHERE created_at >= '2026-01-01',BETWEEN,<,> - IN lists:
WHERE status IN ('pending', 'paid') - Sorting:
ORDER BY created_atcan read straight from the index in order, skipping a sort step entirely - Prefix matching:
WHERE email LIKE 'phoenix%'(anchored at the start, no leading wildcard)
If your filter is one of those, a B-tree is almost always the right answer. The exotic types (GIN, GiST, BRIN) exist for cases B-trees handle badly, but reach for them deliberately, not by default.
Reading EXPLAIN: The Only Way to Actually Know
Stop guessing about indexes — Postgres tells you exactly what it's doing if you ask. EXPLAIN shows the planner's chosen plan and cost estimates; EXPLAIN (ANALYZE, BUFFERS) actually runs the query and reports real timing and I/O.
Without an index, you get something like:
Here's how I read that, line by line:
Seq Scan— reading the whole table. A red flag on a big table with a selective filter.cost=0.00..35819.00— the planner's estimate in arbitrary units: startup cost, then total cost. These are estimates, not milliseconds, useful only for comparing plans against each other.rows=12vsactual ... rows=14— estimated vs real row counts. When they differ wildly (100x+), your statistics are stale and the planner is flying blind.Rows Removed by Filter: 1999986— it threw away two million rows to keep 14. Pure waste, and the clearest signal you need an index.Buffers: ... read=18627— 18,627 8KB pages read from disk. Reason enough to always addBUFFERS.Execution Time: 182.34 ms— the real wall-clock number you care about.
Add the index and run it again:
Four buffers instead of 18,627. Index Cond (not Filter) means the condition was satisfied by the index itself rather than by reading rows and discarding them — that's the index doing its job.
The Third Option: Bitmap Heap Scan
There's a middle ground you'll see constantly:
When an index matches many rows, jumping to each individually causes random I/O all over the disk. So Postgres uses the index to build a bitmap of which pages contain matches, then reads those pages in physical order. It's a hybrid between a pure index scan and a seq scan, and it's also how Postgres combines two indexes (you'll see BitmapAnd / BitmapOr nodes). A bitmap scan is usually fine and often optimal.
Why the Planner Sometimes Ignores Your Index
This trips people up: you created an index, but EXPLAIN still shows a seq scan, so you assume something's broken. Usually nothing is. If a query returns a large fraction of the table — say more than 5-10% of rows — a sequential scan is genuinely faster than millions of random index lookups plus heap fetches, and the planner chooses correctly. This is low selectivity: the index doesn't narrow things enough to be worth it. An index on a boolean, or a status column where 90% of rows share one value, is mostly useless for that common value (a partial index is the fix — keep reading).
Composite Indexes and Why Column Order Is Everything
A multi-column index indexes the combination of columns, in the order you list them:
The critical rule is the leftmost prefix. This index can serve:
WHERE customer_id = 4217(uses the first column)WHERE customer_id = 4217 AND status = 'paid'(uses both)
But it is essentially useless for:
WHERE status = 'paid'(skips the leading column)
Think of it like a phone book sorted by last name, then first name. You can find everyone named "Okafor", and "Okafor, Ada" specifically — but the book is no help at all for finding everyone whose first name is "Ada", because first name is the second sort key.
So which order? My rule of thumb:
- Equality columns before range columns. For
WHERE customer_id = 4217 AND created_at > '2026-01-01', build(customer_id, created_at): equality narrows to the customer, then the range oncreated_atis contiguous. Reverse it to(created_at, customer_id)and the leading range scatters the customer match across the whole index. - Most selective / most commonly filtered column first, subject to rule 1.
- Don't build both
(a, b)and(a)— the composite already covers queries onaalone via its leftmost prefix. The standalone index onais redundant weight.
Covering Indexes and Index-Only Scans
Normally an index scan is two steps: find the row's location in the index, then visit the table heap to fetch the columns. If the index already contains every column the query needs, Postgres skips the second step. That's an index-only scan — a real win for hot queries.
You can pad an index with extra columns using INCLUDE:
Now this query never touches the table:
Heap Fetches: 0 is the goal. INCLUDE columns live only in the leaf nodes — they can't be searched or sorted on, but they're cheap to carry and don't bloat the tree's internal pages. Use INCLUDE for columns you SELECT but never filter or sort on. One caveat: index-only scans rely on the visibility map, so a heavily churned table that hasn't been vacuumed recently may still show some heap fetches.
Partial Indexes: Index Only What You Query
This is one of the most underused features in Postgres, and one of my favorites: you put a WHERE clause on the index itself so it includes only the rows you actually query.
Classic case — a "soft delete" column where almost every query filters out deleted rows:
Or a job queue where you only ever look at pending work:
The payoffs are real. If 95% of your jobs are 'done', this index is roughly 20x smaller than a full one on status and faster to scan — and it rescues the low-selectivity problem from earlier: a status column that's useless to index normally becomes highly effective when you index only the rare, interesting value. The catch: the query's WHERE clause must clearly imply the index predicate. WHERE status = 'pending' matches; a parameterized WHERE status = $1 may not, because the planner can't prove $1 equals 'pending' at plan time.
Expression Indexes: When You Filter on a Function
Here's a bug I've watched senior engineers chase for an hour. You have an index on email, but this query does a seq scan:
The index is on email, but you're searching on lower(email) — and to Postgres that's a different value. Wrapping a column in a function hides the raw column from its index. The fix is to index the expression itself:
Now the query uses the index. The rule generalizes: any function or transformation applied to an indexed column in your WHERE clause disables a plain index on that column. WHERE date(created_at) = '2026-06-05' won't use an index on created_at — rewrite it as a range (created_at >= '2026-06-05' AND created_at < '2026-06-06') or build an expression index. Same story with WHERE amount_cents / 100 > 50 or WHERE substr(code, 1, 3) = 'ABC'.
Indexing for ORDER BY ... LIMIT (a.k.a. Pagination)
Because B-trees are sorted, they can satisfy an ORDER BY without a separate sort step — but only if the sort direction matches. A feed query like this:
wants an index that's already in the right order:
The planner walks straight to author_id = 88, reads 20 entries already sorted newest-first, and stops — no sort step, no scanning the author's whole history. Without it, Postgres fetches every matching row and sorts them just to discard all but 20, which is brutal for prolific authors.
For deep pagination, drop OFFSET entirely. OFFSET 100000 still reads and discards 100,000 rows. Use keyset (cursor) pagination instead, which stays fast at any depth because it uses the same index to seek directly:
GIN, Trigrams, and the Non-B-tree World (Briefly)
B-trees can't index "is this value somewhere inside this document/array/string". For that, Postgres has GIN (Generalized Inverted Index), which maps each contained element back to the rows holding it.
jsonb:
Arrays:
Full-text search:
Substring / fuzzy LIKE with trigrams: A leading-wildcard LIKE '%phoenix%' can never use a normal B-tree. The pg_trgm extension breaks strings into three-character chunks and indexes those:
These are powerful but heavier to build and maintain than B-trees — use them when the access pattern demands it, not preemptively.
Indexes Aren't Free: The Cost of Over-Indexing
Every index makes reads faster and writes slower. That trade-off is the whole game, and beginners almost always over-index.
- Write amplification: Every
INSERTand everyUPDATEtouching an indexed column must update every relevant index. Ten indexes means an insert does eleven writes — that adds up fast on a write-heavy table. - Storage: Indexes can easily outweigh the table itself; I've seen indexes 3x the size of the data.
- Bloat: Under MVCC, updates and deletes leave dead tuples in both the table and its indexes until
VACUUMreclaims them. More indexes, more places for bloat. - Planning time: More candidate indexes, more time spent choosing a plan.
My discipline: index the read patterns your app actually runs, not every column — especially the hot queries and the columns in WHERE / JOIN / ORDER BY clauses. Don't speculatively index a column because it "might" get queried someday.
Why Isn't My Index Being Used? A Checklist
When you're sure an index should apply but EXPLAIN disagrees, it's almost always one of these:
- Type mismatch. Comparing a
bigintcolumn to a text literal, or avarcharto aninteger, can force an implicit cast that hides the column from its index. Match types exactly. - A function on the column.
lower(col),date(col),col::text— covered above. Index the expression or rewrite the predicate. - Leading wildcard LIKE.
LIKE '%foo'orLIKE '%foo%'can't use a normal B-tree. Use trigrams. - OR conditions across different columns.
WHERE a = 1 OR b = 2sometimes can't use a single index well; the planner may seq scan. Two separate indexes (which Postgres combines with aBitmapOr), or rewriting as aUNION, often helps. - Low selectivity. The query matches too much of the table; a seq scan really is faster. Not a bug.
- Stale statistics. The planner relies on table stats to estimate row counts. After a big bulk load or mass update, those go stale and plans go bad. Run
ANALYZE:
When estimated rows and actual rows in your EXPLAIN ANALYZE output diverge by orders of magnitude, suspect stale stats first.
Maintenance: Finding Dead Weight and Building Safely
Find unused indexes
Postgres tracks how often each index is used in pg_stat_user_indexes. Those with idx_scan = 0 are removal candidates — they cost writes and storage while earning nothing:
The pg_constraint exclusion keeps you from flagging indexes backing primary keys and unique constraints. Also confirm the stats have been accumulating long enough to be meaningful — an index used only by a monthly report will look unused on any given Tuesday.
Rebuild bloated indexes
Over time, churn bloats indexes and slows them. REINDEX rebuilds them; on Postgres 12+, do it without locking out writes:
Build indexes without taking down production
This one bites people. A plain CREATE INDEX takes an ACCESS EXCLUSIVE-style lock that blocks writes to the table for the entire build — on a large, busy table, that's minutes of downtime. Always use CONCURRENTLY in production:
It builds without blocking reads or writes. The trade-offs: it's slower, can't run inside a transaction block, and if it fails partway it leaves an INVALID index you must drop (also CONCURRENTLY) and rebuild — find them with SELECT ... FROM pg_index WHERE NOT indisvalid.
The Practical Checklist
When a query is slow, this is the loop I run:
- Run
EXPLAIN (ANALYZE, BUFFERS)— never guess. Look forSeq Scanon big tables andRows Removed by Filter. - Check estimated vs actual rows. Big gap means
ANALYZEthe table. - Index the columns in
WHERE,JOIN, andORDER BY— equality columns first, range columns last. - Match
ORDER BYdirection in the index for sortedLIMITqueries; prefer keyset overOFFSETfor deep pagination. - Consider a partial index when most rows share one value.
- Consider
INCLUDEcolumns to get an index-only scan on a hot query. - Watch for index-killers: functions on columns, type mismatches, leading-wildcard
LIKE. - Reach for GIN (jsonb, arrays, full-text) and
pg_trgm(substring search) only when B-trees can't do the job. - Audit
pg_stat_user_indexesperiodically and drop the dead weight. - Build with
CREATE INDEX CONCURRENTLYon any table that takes live traffic.
Indexing isn't magic and it isn't guesswork. It's reading the plan, understanding what the planner is choosing and why, and giving it a better option. Do that consistently and the 2am pages mostly stop.
Further Reading
- — the canonical reference, including all the index types.
- — how to read query plans properly.
- — the best free resource on SQL indexing, database-agnostic but deeply applicable to Postgres.
- — paste an
EXPLAIN ANALYZEplan and get it visualized with the slow nodes highlighted.