Sendil Kumar N

Stop Reviewing Code. Start Reviewing Intent.

Is your team drowning in code reviews? You're not alone.

Code review was designed for a world where writing code was slow and reading it was fast. One engineer, one afternoon, 200 lines (maybe more with coffee 😉). Easy to skim, easy to review.

Agents broke that math.

Now that engineer can produce 2,000+ lines (10x, really!) before their coffee gets cold. And the person reviewing it? Still a human, reading at human speed. This is where the old code review system breaks (and no, !q/!w won't scale).

The bottleneck isn't how we review. It's what we're reviewing.

Stop optimizing the queue. Redesign what review even means.

Shift 1: The author owns correctness, review owns judgment

Old model: the reviewer checks if the code is correct, well-tested, follows patterns, and handles edge cases.

New model: the LLM and CI prove correctness. The human only weighs in on what machines genuinely can't judge. Does this belong in the codebase at all? Does the abstraction match where we're heading? Is this even the right problem to be solving? (And yes, that list is shrinking.)

Before a PR is human-eligible, the author must have:

  • Run an LLM review pass on their own diff and addressed every comment with a note.
  • Generated and run tests covering the new behavior, with mutation testing or coverage-on-diff gating.
  • Written a PR description the LLM helped produce: what changed, why, what was rejected, what the blast radius is, and a short summary at the top.

If those three things aren't done, the PR isn't ready for a human. The cheap work (line-by-line correctness) belongs to machines. Humans step in when judgment is actually needed.

Shift 2: Review the intent, not the diff

When an AI wrote 800 lines, reading 800 lines is the wrong unit of work. Reviewers read:

  • The PR description and the "why"
  • The test cases, because tests describe intent more compactly than implementation
  • The interface changes: public APIs, schemas, contracts
  • Spot-checks of implementation, only where the tests or description raised a flag

A reviewer asking "did the AI hallucinate a library?" is doing work the LLM should have done. A reviewer asking "do we actually want this feature, and is this the right shape for it?" is doing work only they can do.

Reviews get radically faster (minutes, not hours) because reviewers are working at the design layer, not the syntax layer.

Shift 3: Trust tiers with teeth

Define three buckets. Write them down. Apply them with tooling.

  • Auto-merge on green: Generated code in well-tested areas, dependency bumps, copy changes, internal refactors with no interface change. Green CI + LLM review pass = merged. No human in the loop.
  • One-look review: Standard feature work, bug fixes with tests, anything touching shared code. One human, 5-minute review focused on intent and interfaces. Auto-merge after approval.
  • Deep review: New services, auth/data/billing, public APIs, anything with security or migration implications, anything the author flags as uncertain. Two humans, synchronous discussion encouraged, no rush.

Bucket 1 should be 40 to 60% of PRs on an AI-augmented team. If it's not, your CI and test coverage aren't strong enough yet. Fix that before fixing review.

Shift 4: Move the bottleneck upstream

The highest-leverage review isn't on the PR. It's on the prompt and the plan. Five minutes agreeing on the approach before the AI writes the code saves an hour of review afterward. Build the habit:

  • Short design notes (sometimes just a Slack thread) before any non-trivial work begins
  • The author shares what they're about to ask the LLM to build, and gets a quick "yes, that shape" or "no, think about X" from a teammate
  • The eventual PR becomes a confirmation, not a discovery

For a small team this is nearly free. A 10-minute conversation that prevents a 2-hour review backlog.

Shift 5: Observability replaces gatekeeping

For anything you can flag, monitor, and revert, the question isn't "is this PR perfect?" It's "can we safely roll this back if it isn't?" Invest in:

  • Feature flags as the default for user-facing changes
  • Per-PR preview environments
  • Strong production monitoring with diff-aware alerts
  • One-command revert that actually works

When revert is cheap, review can be lighter without quality dropping. Most teams over-invest in pre-merge review and under-invest in post-merge safety. AI volume makes that imbalance fatal.

Next steps

  • Write down the three trust tiers and put them in your repo's CONTRIBUTING.md
  • Require an LLM review pass before any human review. Make it a PR template checkbox.
  • Set up auto-merge on green for the safe tier
  • Start every non-trivial task with a 5-minute design alignment, not a PR
  • Audit your test suite. Is it strong enough to actually trust the safe tier? If not, that's the project for the next two weeks.

One thing to be honest with your team about: AI-augmented coding only stays a win if you don't drown in the output. The discipline is to write less, ship smaller, and use the LLM's leverage on quality (tests, review, design feedback) at least as much as on volume. A team that ships 10x code but the same amount of value just created 10x maintenance burden for future-you.

Curious what others are seeing. Is your review queue better or worse than a year ago?