Skip to content
All posts

An architecture review that doesn't hurt, doesn't work.

The good ones make technical teams defensive. That's how you know it's working.

An architecture review that doesnt hurt, doesnt work

Why Most Architecture Reviews Don't Work

A good architecture review isn't a pleasant one. The reviews that actually move the needle are the ones that make technical teams defensive, not because the reviewers are combative, but because they found something real. Something the team already half-knew but hadn't said out loud. Most organizations run reviews that feel thorough. They produce documentation, generate diagrams, and result in slide decks with amber RAG statuses. But they don't change decisions. They don't surface the failure modes that actually keep systems down at 2am. They audit what's been written, not what's actually running.

 

A review that doesn't hurt, doesn't work. If your last review didn't make anyone uncomfortable, you ran a documentation audit. Not an architecture review.

 

The Pattern Is Consistent

Across architecture reviews in organizations of all sizes, from scaling startups to global enterprises, three findings appear with near-universal regularity. They're not edge cases. They're not signs of a poorly run engineering org. They're the natural accumulation of systems that grew under pressure, decisions made in hallways, and debt deferred one quarter at a time. The question isn't whether these findings exist in your architecture. It's whether your last review was honest enough to surface them.

 

FINDING #1

Hidden Dependencies

The "not important" system that's actually a single point of failure for 8 critical business processes.

 

Not in the Runbook

The system exists, does its job quietly, and no one thinks to include it in incident response documentation. Until it fails.

Not in the DR Plan

Disaster recovery was designed around the systems everyone knew mattered. This one slipped through, categorized as "low priority" years ago and never reassessed.

Not in the Asset Register

It's running on hardware that predates the current infrastructure team. Nobody owns it. Nobody's responsible for patching it. Everyone assumes someone else is watching it.

Down for 4 Hours

That's when you discover eight business-critical processes silently depended on it. The post-mortem writes itself, and it's not flattering.

 

The Dependency You Don't Know About Is the One That Brings You Down

Hidden dependencies don't form because engineers are careless. They form because systems evolve. A utility service gets reused. An internal API gets called by something new. A batch job becomes load-bearing infrastructure. Over three or four years of organic growth, the dependency map diverges completely from the architecture diagram on the wiki.

The failure mode is always the same: an incident reveals a dependency that nobody mapped, nobody monitored, and nobody included in the blast radius calculation. By the time you're in the postmortem, you're not asking why the system failed. You're asking why you didn't know it mattered.

 

Map What's Actually Calling What

Use runtime dependency tracing, not architecture diagrams. What's in code review is rarely what's in production.

Classify by Impact, Not by Tier

Stop categorizing systems by what they were designed to do. Classify by what breaks if they go down for 30 minutes.

Review the "Boring" Systems First

The highest-risk systems in your architecture are the ones nobody talks about in sprint planning.

 

FINDING #2

Architectural Decisions Never Actually Made

Three years ago, the team "decided" on microservices. There's no document explaining why. No criteria for when it's acceptable to break the pattern. No named authority to change it. Every new architect reinterprets from scratch.

What a Real Decision Looks Like

What Most Teams Actually Have

Documented Context

What problem were we solving? What constraints existed at the time? What alternatives were rejected and why?

Explicit Boundaries

Where does this decision apply? When is it acceptable to deviate? Who has the authority to grant an exception?

Named Ownership

A decision without an owner isn't a decision, it's a suggestion. Someone has to be accountable for whether this still makes sense in 18 months.

Tribal Memory

"We went microservices because of the 2021 scaling incident." Ask three engineers. Get three different stories.

Undocumented Exceptions

The monolith that "doesn't count." The shared database that everyone knows about but nobody officially sanctioned.

Reinterpretation by Default

Each new architect fills in the gaps with their own judgment. The architecture drifts without anyone making the call to change it.

 

Undecided Architecture Doesn't Stay Stable. It Drifts

Architecture Decision Records (ADRs) exist precisely because institutional memory is unreliable. When a decision lives only in the heads of the engineers who made it, it has a half-life. People leave. Context gets compressed into a three-word origin story. The nuance: the constraints, the trade-offs, the conditions that made the decision right, disappears.

What remains is a pattern with no justification. New engineers inherit it, assume it's optimal, and extend it into contexts it was never designed to handle. The architecture doesn't break overnight. It degrades through a thousand small decisions, each locally reasonable, none of them clearly wrong, but collectively steering the system somewhere nobody intended to go.

Technical Debt That Became Business Debt

That 6-week refactor from three years ago is now an 18-month program. Not because the code got worse, but because every change now requires testing across 14 related systems. The cost of change has exceeded the cost of replacement.

Year 1: Known tech debt. Estimated at 6 weeks. Deferred to next quarter, there's a roadmap commitment.

Year 2: Still deferred. Now 12 weeks. Three new services depend on the legacy component. Test coverage is sparse.

Year 3: Estimate: 6 months. Change freeze required. 14 downstream systems need regression testing. Business now owns the risk.

Year 4: Program budget: 18 months, cross-functional team. Cost of replacement is lower than cost of change. It's no longer a tech problem.

 

When Technical Debt Becomes a Business Problem

The Real Metric Nobody Tracks

Engineering teams track debt as lines of code, test coverage percentage, or a number of JIRA tickets in a backlog labeled "tech debt." None of those metrics tell you what actually matters: the cost of making a change.

When a two-day feature requires three weeks of regression testing across fourteen systems, that's not technical debt. That's business debt. The product team can't move fast. The sales team can't commit to delivery timelines. The CTO can't respond to competitive pressure. The debt has migrated off the engineering balance sheet and onto the business P&L.

Three Warning Signs

  1. Change Radius Keeps Growing. Small features now require touching systems that shouldn't be related. Coupling is invisible until you try to move.

  2. Estimates Are No Longer Reliable. Engineers add buffers. Buffers get exceeded. The team has learned that surprises are structural, not accidental.

     

  3. Nobody Knows the Safe Path. When senior engineers disagree on the right refactoring approach, you've crossed from technical debt into architectural ambiguity.

     

The Anatomy of an Architecture Review That Actually Works

Most reviews audit documentation. A real architecture review stress-tests assumptions. Here's what separates one from the other.

Interview the People, Not the Docs

Ask three engineers independently how a critical process works. Where the stories diverge is where your risk lives. Documentation reflects intent; engineers reflect reality.

Simulate Failure Scenarios

Don't ask what the DR plan says. Ask what actually happens when System X goes down at 11pm on a Friday. The gap between the plan and the answer is your exposure.

Measure the Cost of Change

Pick three recent features. Calculate total engineer-hours including testing, coordination, and rework. Compare against initial estimates. That ratio tells you more than any architecture diagram.

Audit Decisions, Not Diagrams

For every major architectural pattern in the system, ask: who made this decision, when, based on what constraints, and who has authority to change it today? Silence is a finding.

 

The Question That Cuts Through Everything

When was the last architecture review at your organization that actually changed a decision?

Not a review that produced a report. Not one that updated a wiki page or generated a set of recommendations that lived in a shared drive. A review that resulted in a specific technical decision being reversed, deferred, or accelerated, with a named owner and a follow-up date.

If you're struggling to answer that, you're not alone. Most organizations have review processes that are built for compliance, not candor. They're designed to demonstrate due diligence, not to find problems. The incentive structure rewards teams that come in with clean diagrams and confident answers, not teams that surface the uncomfortable truths that actually need to surface.

 

A review process that makes everyone feel good is a process that protects the status quo. Architecture reviews should create productive discomfort, not validation ceremonies for decisions already made.

 

The three findings outlined here: hidden dependencies, undocumented decisions, and debt that became a business constraint, aren't diagnostic curiosities. They're the architectural failure modes that show up in incident reports, missed release dates, and re-platforming programs that cost ten times what the original refactor would have. Find them before they find you.