Built byPhoenix

© 2026 Phoenix

← Blog
PostgreSQLDatabasesPerformanceIndexingSQLBackend

The Index You're Missing: A Practical Guide to Postgres Performance

Phoenix·June 5, 2026·16 min read

The Index You're Missing: A Practical Guide to Postgres Performance

A few years ago I got paged at 2am because an internal dashboard had stopped loading. Nothing had been deployed; no config had changed. The query behind it had run unchanged for two years. It used to take 40ms. Now it took 12 seconds, and under load it piled up connections until the pool starved and the whole API fell over.

The cause was boring, which is to say it was the usual cause. The transactions table had quietly grown from 50,000 rows to 2 million. The query filtered on account_id, and there was no index on it. At 50,000 rows, scanning the whole table was fast enough that nobody noticed. At 2 million, that same full scan was a wall.

One CREATE INDEX took the query from 12 seconds back to under 2ms. That's the entire value proposition here. Indexing is the highest-leverage skill a backend engineer can have for database performance, and most of the slow queries I've debugged in production weren't a hardware problem or a "we need to shard" problem — they were a missing index, a wrong index, or an index the planner couldn't use. Here's how I think about it.


Seq Scan vs Index Scan: What Actually Happens

When you run a query like this with no useful index:

sql
SELECT * FROM orders WHERE customer_id = 4217;

Postgres has exactly one option: read every row in the table and check whether customer_id = 4217. That's a sequential scan, and the work is proportional to the whole table, not the result. Finding 14 rows in a 2-million-row table means reading all 2 million.

An index changes the shape of that work. It's a separate, ordered structure mapping column values to the physical location (the tuple ID, or ctid) of matching rows on disk. Instead of reading the whole table, Postgres walks the index to the handful of locations it cares about and jumps straight there. The work becomes proportional to the rows you actually want, plus a logarithmic search cost — the difference between 12 seconds and 2ms.


B-tree: The Default, and the One You'll Use 95% of the Time

When you write CREATE INDEX with no type specified, you get a B-tree:

sql
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

A B-tree keeps values sorted in a balanced tree, which makes it good at a specific, very common set of operations:

  • Equality: WHERE customer_id = 4217
  • Range: WHERE created_at >= '2026-01-01', BETWEEN, <, >
  • IN lists: WHERE status IN ('pending', 'paid')
  • Sorting: ORDER BY created_at can read straight from the index in order, skipping a sort step entirely
  • Prefix matching: WHERE email LIKE 'phoenix%' (anchored at the start, no leading wildcard)

If your filter is one of those, a B-tree is almost always the right answer. The exotic types (GIN, GiST, BRIN) exist for cases B-trees handle badly, but reach for them deliberately, not by default.


Reading EXPLAIN: The Only Way to Actually Know

Stop guessing about indexes — Postgres tells you exactly what it's doing if you ask. EXPLAIN shows the planner's chosen plan and cost estimates; EXPLAIN (ANALYZE, BUFFERS) actually runs the query and reports real timing and I/O.

sql
EXPLAIN (ANALYZE, BUFFERS)SELECT * FROM orders WHERE customer_id = 4217;

Without an index, you get something like:

text
Seq Scan on orders  (cost=0.00..35819.00 rows=12 width=84)                    (actual time=0.41..182.30 rows=14 loops=1)  Filter: (customer_id = 4217)  Rows Removed by Filter: 1999986  Buffers: shared hit=192 read=18627Planning Time: 0.094 msExecution Time: 182.34 ms

Here's how I read that, line by line:

  • Seq Scan — reading the whole table. A red flag on a big table with a selective filter.
  • cost=0.00..35819.00 — the planner's estimate in arbitrary units: startup cost, then total cost. These are estimates, not milliseconds, useful only for comparing plans against each other.
  • rows=12 vs actual ... rows=14 — estimated vs real row counts. When they differ wildly (100x+), your statistics are stale and the planner is flying blind.
  • Rows Removed by Filter: 1999986 — it threw away two million rows to keep 14. Pure waste, and the clearest signal you need an index.
  • Buffers: ... read=18627 — 18,627 8KB pages read from disk. Reason enough to always add BUFFERS.
  • Execution Time: 182.34 ms — the real wall-clock number you care about.

Add the index and run it again:

text
Index Scan using idx_orders_customer_id on orders  (cost=0.43..40.55 rows=12 width=84)  (actual time=0.028..0.071 rows=14 loops=1)  Index Cond: (customer_id = 4217)  Buffers: shared hit=4Execution Time: 0.103 ms

Four buffers instead of 18,627. Index Cond (not Filter) means the condition was satisfied by the index itself rather than by reading rows and discarding them — that's the index doing its job.

The Third Option: Bitmap Heap Scan

There's a middle ground you'll see constantly:

text
Bitmap Heap Scan on orders  (cost=12.05..1834.20 rows=4800 width=84)                            (actual time=0.21..1.88 rows=5120 loops=1)  Recheck Cond: (status = 'pending'::text)  Heap Blocks: exact=478  ->  Bitmap Index Scan on idx_orders_status  (cost=0.00..11.93 rows=4800 width=0)                                              (actual time=0.14..0.14 rows=5120 loops=1)        Index Cond: (status = 'pending'::text)

When an index matches many rows, jumping to each individually causes random I/O all over the disk. So Postgres uses the index to build a bitmap of which pages contain matches, then reads those pages in physical order. It's a hybrid between a pure index scan and a seq scan, and it's also how Postgres combines two indexes (you'll see BitmapAnd / BitmapOr nodes). A bitmap scan is usually fine and often optimal.

Why the Planner Sometimes Ignores Your Index

This trips people up: you created an index, but EXPLAIN still shows a seq scan, so you assume something's broken. Usually nothing is. If a query returns a large fraction of the table — say more than 5-10% of rows — a sequential scan is genuinely faster than millions of random index lookups plus heap fetches, and the planner chooses correctly. This is low selectivity: the index doesn't narrow things enough to be worth it. An index on a boolean, or a status column where 90% of rows share one value, is mostly useless for that common value (a partial index is the fix — keep reading).


Composite Indexes and Why Column Order Is Everything

A multi-column index indexes the combination of columns, in the order you list them:

sql
CREATE INDEX idx_orders_customer_status  ON orders (customer_id, status);

The critical rule is the leftmost prefix. This index can serve:

  • WHERE customer_id = 4217 (uses the first column)
  • WHERE customer_id = 4217 AND status = 'paid' (uses both)

But it is essentially useless for:

  • WHERE status = 'paid' (skips the leading column)

Think of it like a phone book sorted by last name, then first name. You can find everyone named "Okafor", and "Okafor, Ada" specifically — but the book is no help at all for finding everyone whose first name is "Ada", because first name is the second sort key.

So which order? My rule of thumb:

  1. Equality columns before range columns. For WHERE customer_id = 4217 AND created_at > '2026-01-01', build (customer_id, created_at): equality narrows to the customer, then the range on created_at is contiguous. Reverse it to (created_at, customer_id) and the leading range scatters the customer match across the whole index.
  2. Most selective / most commonly filtered column first, subject to rule 1.
  3. Don't build both (a, b) and (a) — the composite already covers queries on a alone via its leftmost prefix. The standalone index on a is redundant weight.

Covering Indexes and Index-Only Scans

Normally an index scan is two steps: find the row's location in the index, then visit the table heap to fetch the columns. If the index already contains every column the query needs, Postgres skips the second step. That's an index-only scan — a real win for hot queries.

You can pad an index with extra columns using INCLUDE:

sql
CREATE INDEX idx_orders_customer_covering  ON orders (customer_id) INCLUDE (status, total_cents);

Now this query never touches the table:

sql
SELECT status, total_cents FROM orders WHERE customer_id = 4217;
text
Index Only Scan using idx_orders_customer_covering on orders  (actual time=0.024..0.026 rows=3 loops=1)  Index Cond: (customer_id = 4217)  Heap Fetches: 0

Heap Fetches: 0 is the goal. INCLUDE columns live only in the leaf nodes — they can't be searched or sorted on, but they're cheap to carry and don't bloat the tree's internal pages. Use INCLUDE for columns you SELECT but never filter or sort on. One caveat: index-only scans rely on the visibility map, so a heavily churned table that hasn't been vacuumed recently may still show some heap fetches.


Partial Indexes: Index Only What You Query

This is one of the most underused features in Postgres, and one of my favorites: you put a WHERE clause on the index itself so it includes only the rows you actually query.

Classic case — a "soft delete" column where almost every query filters out deleted rows:

sql
CREATE INDEX idx_users_email_active  ON users (email)  WHERE deleted_at IS NULL;

Or a job queue where you only ever look at pending work:

sql
CREATE INDEX idx_jobs_pending  ON jobs (created_at)  WHERE status = 'pending';

The payoffs are real. If 95% of your jobs are 'done', this index is roughly 20x smaller than a full one on status and faster to scan — and it rescues the low-selectivity problem from earlier: a status column that's useless to index normally becomes highly effective when you index only the rare, interesting value. The catch: the query's WHERE clause must clearly imply the index predicate. WHERE status = 'pending' matches; a parameterized WHERE status = $1 may not, because the planner can't prove $1 equals 'pending' at plan time.


Expression Indexes: When You Filter on a Function

Here's a bug I've watched senior engineers chase for an hour. You have an index on email, but this query does a seq scan:

sql
SELECT * FROM users WHERE lower(email) = 'phoenix@fynix.dev';

The index is on email, but you're searching on lower(email) — and to Postgres that's a different value. Wrapping a column in a function hides the raw column from its index. The fix is to index the expression itself:

sql
CREATE INDEX idx_users_lower_email ON users (lower(email));

Now the query uses the index. The rule generalizes: any function or transformation applied to an indexed column in your WHERE clause disables a plain index on that column. WHERE date(created_at) = '2026-06-05' won't use an index on created_at — rewrite it as a range (created_at >= '2026-06-05' AND created_at < '2026-06-06') or build an expression index. Same story with WHERE amount_cents / 100 > 50 or WHERE substr(code, 1, 3) = 'ABC'.


Indexing for ORDER BY ... LIMIT (a.k.a. Pagination)

Because B-trees are sorted, they can satisfy an ORDER BY without a separate sort step — but only if the sort direction matches. A feed query like this:

sql
SELECT * FROM postsWHERE author_id = 88ORDER BY created_at DESCLIMIT 20;

wants an index that's already in the right order:

sql
CREATE INDEX idx_posts_author_created  ON posts (author_id, created_at DESC);

The planner walks straight to author_id = 88, reads 20 entries already sorted newest-first, and stops — no sort step, no scanning the author's whole history. Without it, Postgres fetches every matching row and sorts them just to discard all but 20, which is brutal for prolific authors.

For deep pagination, drop OFFSET entirely. OFFSET 100000 still reads and discards 100,000 rows. Use keyset (cursor) pagination instead, which stays fast at any depth because it uses the same index to seek directly:

sql
SELECT * FROM postsWHERE author_id = 88  AND created_at < '2026-06-05 12:00:00'  -- cursor from the last rowORDER BY created_at DESCLIMIT 20;

GIN, Trigrams, and the Non-B-tree World (Briefly)

B-trees can't index "is this value somewhere inside this document/array/string". For that, Postgres has GIN (Generalized Inverted Index), which maps each contained element back to the rows holding it.

jsonb:

sql
CREATE INDEX idx_events_payload ON events USING gin (payload);-- serves containment queries:  WHERE payload @> '{"type": "signup"}'

Arrays:

sql
CREATE INDEX idx_articles_tags ON articles USING gin (tags);-- serves:  WHERE tags @> ARRAY['postgres']  or  tags && ARRAY['sql','db']

Full-text search:

sql
CREATE INDEX idx_docs_fts ON docs USING gin (to_tsvector('english', body));-- serves:  WHERE to_tsvector('english', body) @@ plainto_tsquery('indexing')

Substring / fuzzy LIKE with trigrams: A leading-wildcard LIKE '%phoenix%' can never use a normal B-tree. The pg_trgm extension breaks strings into three-character chunks and indexes those:

sql
CREATE EXTENSION IF NOT EXISTS pg_trgm;CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);-- now this can use an index:  WHERE name ILIKE '%phoenix%'

These are powerful but heavier to build and maintain than B-trees — use them when the access pattern demands it, not preemptively.


Indexes Aren't Free: The Cost of Over-Indexing

Every index makes reads faster and writes slower. That trade-off is the whole game, and beginners almost always over-index.

  • Write amplification: Every INSERT and every UPDATE touching an indexed column must update every relevant index. Ten indexes means an insert does eleven writes — that adds up fast on a write-heavy table.
  • Storage: Indexes can easily outweigh the table itself; I've seen indexes 3x the size of the data.
  • Bloat: Under MVCC, updates and deletes leave dead tuples in both the table and its indexes until VACUUM reclaims them. More indexes, more places for bloat.
  • Planning time: More candidate indexes, more time spent choosing a plan.

My discipline: index the read patterns your app actually runs, not every column — especially the hot queries and the columns in WHERE / JOIN / ORDER BY clauses. Don't speculatively index a column because it "might" get queried someday.


Why Isn't My Index Being Used? A Checklist

When you're sure an index should apply but EXPLAIN disagrees, it's almost always one of these:

  1. Type mismatch. Comparing a bigint column to a text literal, or a varchar to an integer, can force an implicit cast that hides the column from its index. Match types exactly.
  2. A function on the column. lower(col), date(col), col::text — covered above. Index the expression or rewrite the predicate.
  3. Leading wildcard LIKE. LIKE '%foo' or LIKE '%foo%' can't use a normal B-tree. Use trigrams.
  4. OR conditions across different columns. WHERE a = 1 OR b = 2 sometimes can't use a single index well; the planner may seq scan. Two separate indexes (which Postgres combines with a BitmapOr), or rewriting as a UNION, often helps.
  5. Low selectivity. The query matches too much of the table; a seq scan really is faster. Not a bug.
  6. Stale statistics. The planner relies on table stats to estimate row counts. After a big bulk load or mass update, those go stale and plans go bad. Run ANALYZE:
sql
ANALYZE orders;

When estimated rows and actual rows in your EXPLAIN ANALYZE output diverge by orders of magnitude, suspect stale stats first.


Maintenance: Finding Dead Weight and Building Safely

Find unused indexes

Postgres tracks how often each index is used in pg_stat_user_indexes. Those with idx_scan = 0 are removal candidates — they cost writes and storage while earning nothing:

sql
SELECT  schemaname,  relname              AS table_name,  indexrelname         AS index_name,  idx_scan,  pg_size_pretty(pg_relation_size(indexrelid)) AS index_sizeFROM pg_stat_user_indexesWHERE idx_scan = 0  AND indexrelid NOT IN (SELECT conindid FROM pg_constraint)ORDER BY pg_relation_size(indexrelid) DESC;

The pg_constraint exclusion keeps you from flagging indexes backing primary keys and unique constraints. Also confirm the stats have been accumulating long enough to be meaningful — an index used only by a monthly report will look unused on any given Tuesday.

Rebuild bloated indexes

Over time, churn bloats indexes and slows them. REINDEX rebuilds them; on Postgres 12+, do it without locking out writes:

sql
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

Build indexes without taking down production

This one bites people. A plain CREATE INDEX takes an ACCESS EXCLUSIVE-style lock that blocks writes to the table for the entire build — on a large, busy table, that's minutes of downtime. Always use CONCURRENTLY in production:

sql
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

It builds without blocking reads or writes. The trade-offs: it's slower, can't run inside a transaction block, and if it fails partway it leaves an INVALID index you must drop (also CONCURRENTLY) and rebuild — find them with SELECT ... FROM pg_index WHERE NOT indisvalid.


The Practical Checklist

When a query is slow, this is the loop I run:

  • Run EXPLAIN (ANALYZE, BUFFERS) — never guess. Look for Seq Scan on big tables and Rows Removed by Filter.
  • Check estimated vs actual rows. Big gap means ANALYZE the table.
  • Index the columns in WHERE, JOIN, and ORDER BY — equality columns first, range columns last.
  • Match ORDER BY direction in the index for sorted LIMIT queries; prefer keyset over OFFSET for deep pagination.
  • Consider a partial index when most rows share one value.
  • Consider INCLUDE columns to get an index-only scan on a hot query.
  • Watch for index-killers: functions on columns, type mismatches, leading-wildcard LIKE.
  • Reach for GIN (jsonb, arrays, full-text) and pg_trgm (substring search) only when B-trees can't do the job.
  • Audit pg_stat_user_indexes periodically and drop the dead weight.
  • Build with CREATE INDEX CONCURRENTLY on any table that takes live traffic.

Indexing isn't magic and it isn't guesswork. It's reading the plan, understanding what the planner is choosing and why, and giving it a better option. Do that consistently and the 2am pages mostly stop.


Further Reading

  • — the canonical reference, including all the index types.
  • — how to read query plans properly.
  • — the best free resource on SQL indexing, database-agnostic but deeply applicable to Postgres.
  • — paste an EXPLAIN ANALYZE plan and get it visualized with the slow nodes highlighted.
← All postsShare on X
SELECT * FROM orders WHERE customer_id = 4217;
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
EXPLAIN (ANALYZE, BUFFERS)SELECT * FROM orders WHERE customer_id = 4217;
Seq Scan on orders  (cost=0.00..35819.00 rows=12 width=84)                    (actual time=0.41..182.30 rows=14 loops=1)  Filter: (customer_id = 4217)  Rows Removed by Filter: 1999986  Buffers: shared hit=192 read=18627Planning Time: 0.094 msExecution Time: 182.34 ms
Index Scan using idx_orders_customer_id on orders  (cost=0.43..40.55 rows=12 width=84)  (actual time=0.028..0.071 rows=14 loops=1)  Index Cond: (customer_id = 4217)  Buffers: shared hit=4Execution Time: 0.103 ms
Bitmap Heap Scan on orders  (cost=12.05..1834.20 rows=4800 width=84)                            (actual time=0.21..1.88 rows=5120 loops=1)  Recheck Cond: (status = 'pending'::text)  Heap Blocks: exact=478  ->  Bitmap Index Scan on idx_orders_status  (cost=0.00..11.93 rows=4800 width=0)                                              (actual time=0.14..0.14 rows=5120 loops=1)        Index Cond: (status = 'pending'::text)
CREATE INDEX idx_orders_customer_status  ON orders (customer_id, status);
CREATE INDEX idx_orders_customer_covering  ON orders (customer_id) INCLUDE (status, total_cents);
SELECT status, total_cents FROM orders WHERE customer_id = 4217;
Index Only Scan using idx_orders_customer_covering on orders  (actual time=0.024..0.026 rows=3 loops=1)  Index Cond: (customer_id = 4217)  Heap Fetches: 0
CREATE INDEX idx_users_email_active  ON users (email)  WHERE deleted_at IS NULL;
CREATE INDEX idx_jobs_pending  ON jobs (created_at)  WHERE status = 'pending';
SELECT * FROM users WHERE lower(email) = 'phoenix@fynix.dev';
CREATE INDEX idx_users_lower_email ON users (lower(email));
SELECT * FROM postsWHERE author_id = 88ORDER BY created_at DESCLIMIT 20;
CREATE INDEX idx_posts_author_created  ON posts (author_id, created_at DESC);
SELECT * FROM postsWHERE author_id = 88  AND created_at < '2026-06-05 12:00:00'  -- cursor from the last rowORDER BY created_at DESCLIMIT 20;
CREATE INDEX idx_events_payload ON events USING gin (payload);-- serves containment queries:  WHERE payload @> '{"type": "signup"}'
CREATE INDEX idx_articles_tags ON articles USING gin (tags);-- serves:  WHERE tags @> ARRAY['postgres']  or  tags && ARRAY['sql','db']
CREATE INDEX idx_docs_fts ON docs USING gin (to_tsvector('english', body));-- serves:  WHERE to_tsvector('english', body) @@ plainto_tsquery('indexing')
CREATE EXTENSION IF NOT EXISTS pg_trgm;CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);-- now this can use an index:  WHERE name ILIKE '%phoenix%'
ANALYZE orders;
SELECT  schemaname,  relname              AS table_name,  indexrelname         AS index_name,  idx_scan,  pg_size_pretty(pg_relation_size(indexrelid)) AS index_sizeFROM pg_stat_user_indexesWHERE idx_scan = 0  AND indexrelid NOT IN (SELECT conindid FROM pg_constraint)ORDER BY pg_relation_size(indexrelid) DESC;
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);